Patterns of genetic differentiation and the footprints of historical migrations in the Iberian Peninsula.
Patrones de diferenciación genética y
las huellas de las migraciones históricas
en la Península Ibérica
La Península Ibérica es lingüísticamente diversa
y tiene una historia demográfica compleja,
que incluye un período de siglos
de dominio musulmán.
Aquí, estudiamos la estructura genética
a escala fina de su población y los
impactos genéticos de los eventos
históricos, aprovechando poderosos
métodos estadísticos basados en
haplotipos para analizar 1413
individuos de toda España.
Detectamos una estructura
poblacional extensa a escala
fina a escalas extremadamente
finas (por debajo de 10 Km)
en algunas regiones, incluida Galicia.
Identificamos un importante
eje este-oeste de diferenciación genética y evidencia
de un movimiento histórico de población de norte a sur
. Encontramos fracciones que varían regionalmente de
ascendencia del noroeste de África (0-11%) en los íberos de hoy en día,
relacionadas con un evento de mezcla que involucra poblaciones de origen
de tipo europeo y del noroeste de África. Fechamos este evento entre 860 y 1120 EC,
lo que implica mayores impactos genéticos en la primera mitad
del dominio musulmán en Iberia. Juntos, nuestros resultados indican claros impactos genéticos
de los movimientos de población asociados tanto con la conquista musulmana como
con la posterior Reconquista.
The Iberian Peninsula is linguistically diverse and has a complex
demographic history, including a centuries-long period of Muslim rule.
Here, we study the fine-scale genetic structure of its population, and
the genetic impacts of historical events, leveraging powerful,
haplotype-based statistical methods to analyse 1413 individuals from
across Spain. We detect extensive fine-scale population structure at
extremely fine scales (below 10 Km) in some regions, including Galicia.
We identify a major east-west axis of genetic differentiation, and
evidence of historical north to south population movement. We find
regionally varying fractions of north-west African ancestry (0–11%) in
modern-day Iberians, related to an admixture event involving
European-like and north-west African-like source populations. We date
this event to 860–1120 CE, implying greater genetic impacts in the early
half of Muslim rule in Iberia. Together, our results indicate clear
genetic impacts of population movements associated with both the Muslim
conquest and the subsequent Reconquista.
Introduction
Genetic differentiation within or between human populations (population structure) has been studied using a variety of approaches over many years1,2,3,4,5. Recently there has been an increasing focus on studying genetic differentiation at fine geographic scales, such as within countries6,7,8. Identifying such structure allows the study of recent population history, and identifies the potential for confounding in association studies, particularly when testing rare, often recently arisen variants9. The Iberian Peninsula is linguistically diverse, has a complex demographic history, and is unusual among European regions in having a centuries-long period of Muslim rule10.
Previous studies of population structure in Spain have examined either a small fraction of the genome11,12,13 or only a few regions of Spain14,15, and typically compare groups of individuals defined a priori using broad ethnic or geographic labels, such as autonomous community. Using such approaches only limited population structure within Iberia has been identified15,16,17,18,19. Some structure within northern Spain has been detected, including statistically significant differences in frequencies of Y-chromosome haplotypes and other genetic markers between the Basque-speaking regions and other parts of Iberia11,12, a result consistent with a European-wide analysis using autosomal DNA20. Studies of Spain that used genome-wide data did not leverage information in correlations between genetic markers14,15, excepting one study21, which detected a cline of variation broadly distinguishing samples in País Vasco from other parts of northern Spain, especially Galicia, but no evidence of sub-structure in central or southern Spain. Thus the overall pattern of population structure within Spain—including subtle structure at fine geographic scales—remains uncharacterized.
The cultural and linguistic impact of Muslim rule in Iberia is well-documented, but the historical record is limited in its ability to inform about the extent, timing and geographic spread of genetic mixing between immigrants and indigenous Iberians over several centuries after the initial conquest22. Previous genetic studies have reported signals of admixture from sub-Saharan Africa and/or north Africa into Iberia at some point in the past23,24,25,26,27. However, estimates of the timing of this admixture vary greatly, from as long as 74 generations ago (~100 BC)23 to 23 generations ago (~1330 CE)25. Estimates of overall mean proportions of African-like DNA in the Iberian Peninsula also vary, ranging from 2.424 to 10.6%11. Differences within Iberia have also been reported11,26, based on comparisons between sampled regions, with higher fractions observed in western regions of Iberia (e.g. 21.7% in Northwest Castile11) and lower fractions in the north-east (e.g. 2.3% in Cataluña11). Estimates of the timing and extent of admixture tend to vary depending on the reference populations assumed to represent the ancestral mixing groups (e.g. Moroccan11 or Saharawi26), as well as heterogeneity in the ancestral make-up of the modern-day Iberian samples used in the analysis.
Here we analyse genome-wide genotyping array data for 1413 Spanish individuals sampled from across Spain. By using powerful, haplotype-based statistical methods we identify extensive fine-scale structure down to scales <10 km in some places. We identify a major axis of genetic differentiation that runs from east to west across Iberia. In contrast, we observe remarkable genetic similarity in the north–south direction, and evidence of historical north–south population movement. Finally, we sought to clarify the timing and composition of African-like and potentially non-African genetic contributions to the Iberian Peninsula, by jointly analysing genotype data sourced from a wide range of African and European regions. We show that modern Spanish people have regionally varying fractions of ancestry from a group most similar to modern north-west Africans. This African ancestry, identified without making particular prior assumptions about source populations, results from an admixture event that we date to 860–1120 CE, corresponding to the early half of Muslim rule. Our results indicate that it is possible to discern clear genetic impacts of the Muslim conquest and population movements associated with the subsequent Reconquista.
Results
Extensive fine-scale population structure in Spain
We analysed phased genotyping array data for 1413 Spanish individuals typed at 693,092 autosomal single nucleotide polymorphisms (SNPs) after quality control (Methods). We applied fineSTRUCTURE28 to these data to infer clusters of individuals with similar patterns of shared ancestry (Methods). fineSTRUCTURE inferred 145 distinct clusters, along with a hierarchical tree describing relationships between the clusters (Fig. 1a; Methods). We used genetic data only in the inference, but explored the relationship between genetic structure and geography using a subset of 726 individuals for whom geographic information was available and all four grandparents were born within 80 km of the centroid of their birthplaces. Figure 1b represents each of these individuals as a point on a map of Spain, located at the centroid of their grandparents’ birthplaces and labelled according to their cluster assignment after combining small clusters at the bottom of the tree (Methods). Their grandparents were likely to have been born in the decades either side of 1900 (median birth-year of the cohort is 1941), so the spatial distribution of genetic structure described in this study would reflect that of Spain around that time.
spanish individuals grouped into clusters using genetic data only. a Binary tree showing the inferred hierarchical relationships between clusters inferred using genotype data of 1413 individuals (fineSTRUCTURE analysis A). The colours and points correspond to the clusters shown on the map, and the length of the coloured rectangles is proportional to the number of individuals assigned to that cluster. We combined some small clusters (Methods) and the thick black branches indicate the clades of the tree that we visualise in the map. Clusters are labelled according to the approximate location of most of their members, but geographic data was not used in the inference. b Each individual (n = 726) is represented by a point placed at (or close to, <24 Km) the centroid of their grandparents’ birthplaces. We only plot the individuals for whom all four grandparents were born within 80 km of their average birthplace, although the data for all individuals were used in the fineSTRUCTURE inference. The background is coloured according to the spatial densities of each cluster at the level of the tree where there are 14 clusters (Methods). The colour and symbol of each point corresponds to the cluster the individual was assigned to at a lower level of the tree, as shown in a. Spain’s autonomous communities are also shown. c A representation of changes in the linguistic and political boundaries in Iberia from ~930 to 1300 CE, adapted with permission from maps by Baldinger29. Different linguistic areas are shown with the colours and shading, and political boundaries with white borders (in the far right map only). Only the colours and labels of the Christian kingdoms have been added to aid visualisation.
These results reveal patterns of rich fine-scale population structure in
Spain. At the coarsest level of genetic differentiation (i.e. two
clusters at the top of the hierarchy) individuals located in a small
region in south-west Galicia are separated from those in the rest of
Spain. The next level separates individuals located primarily in the
Basque regions in the north (País Vasco and Navarra) from the rest of
Spain. Further down the tree (background colours in Fig. 1b)
many of the clusters closely follow the east–west boundaries of Spain’s
autonomous communities, especially in the north of Spain. However, in
the north–south direction several clusters cross boundaries of multiple
autonomous communities. Overall, the major axis of genetic
differentiation runs from east to west, while conversely there is
remarkable genetic similarity on the north–south direction. In a
complementary analysis that included Portugal, although fewer SNPs
(Methods), Portuguese individuals co-clustered with individuals in
Galicia (Fig. 2a),
showing that this pattern extends across the whole Iberian Peninsula.
Indeed, rather than mainly reflecting modern-day political boundaries
(autonomous communities), the broad-scale genetic structure of the
region is strikingly similar to the linguistic frontiers29 present in the Iberian Peninsula around 1300 CE (Fig. 1c).
Via more formal simulation-based testing, we confirmed this: the
association of genetic structure with language is statistically
significant (p < 0.008), even after accounting for both physical distance and autonomous community membership (Supplementary Note 9; Supplementary Figure 8).
Conversely, once physical distance and language are taken into account,
no significant association with autonomous community remains (p = 0.12).
Clustering analysis including Portuguese individuals; and large clusters at the bottom of the tree. a
This map and tree show clusters inferred by fineSTRUCTURE (analysis B)
that included data from Portuguese individuals but using a smaller set
of SNPs (Methods). As in Fig. 1b
we show the level of tree such that all clusters contain at least 15
individuals (39 clusters). Points representing 843 individuals are shown
on this map but, as with analysis A, data for all Portuguese and
Spanish individuals (1530) were used in the inference. Positions of
points and background colours are determined using the same procedure as
for Fig. 1b (Methods),
with the exception of Portugal. No fine-scale geographic information
was available for these individuals, so we placed them randomly within
the boundaries of Portugal and show a single background colour. b
This map shows geographic spread of the three large clusters that
remain at the bottom of the tree inferred in the Spain-only
fineSTRUCTURE analysis (see main text; Fig. 1a).
These clusters each contain more than 100 individuals out of the full
set of 1413. The accompanying tree highlights the three clusters within
the full tree structure. The width of the coloured rectangles is
proportional to the number of individuals belonging to each cluster
(yellow = 222; orange = 165; red = 123)
Although some geographically dispersed clusters (e.g. ‘central’ and
‘west’) remain largely intact at the bottom of the hierarchical tree
(Fig. 2b)
many of the clusters that emerge further down the tree involve greater
geographical localisation. By far the strongest sub-structure is seen
within a single province in Galicia, Pontevedra, which contains almost
half of the inferred clusters in all of Spain (Fig. 1a).
This ultra-fine structure is seen across scales of <10 km and the
clusters align with regions defined by hills and/or river valleys (Fig. 3a).
This structure is not an artefact of the denser sampling in this
region, as it was still evident in an analysis after sub-sampling