viernes, 6 de mayo de 2022

Patterns of genetic differentiation and the footprints of historical migrations in the Iberian Peninsula. Patrones de diferenciación genética y las huellas de las migraciones históricas en la Península Ibérica

 

Patterns of genetic differentiation and the footprints of historical migrations in the Iberian Peninsula.   

Patrones de diferenciación genética y 
las huellas de las migraciones históricas
 en la Península Ibérica
 
La Península Ibérica es lingüísticamente diversa 
y tiene una historia demográfica compleja, 
que incluye un período de siglos 
de dominio musulmán. 
Aquí, estudiamos la estructura genética
 a escala fina de su población y los 
impactos genéticos de los eventos
 históricos, aprovechando poderosos
 métodos estadísticos basados ​​en
 haplotipos para analizar 1413 
individuos de toda España.
 Detectamos una estructura 
poblacional extensa a escala
 fina a escalas extremadamente 
finas (por debajo de 10 Km) 
en algunas regiones, incluida Galicia. 
Identificamos un importante 
eje este-oeste de diferenciación genética y evidencia
 de un movimiento histórico de población de norte a sur
. Encontramos fracciones que varían regionalmente de
 ascendencia del noroeste de África (0-11%) en los íberos de hoy en día,
 relacionadas con un evento de mezcla que involucra poblaciones de origen 
de tipo europeo y del noroeste de África. Fechamos este evento entre 860 y 1120 EC, 
lo que implica mayores impactos genéticos en la primera mitad 
del dominio musulmán en Iberia. Juntos, nuestros resultados indican claros impactos genéticos
 de los movimientos de población asociados tanto con la conquista musulmana como 
con la posterior Reconquista. 
The Iberian Peninsula is linguistically diverse and has a complex 
demographic history, including a centuries-long period of Muslim rule. 
Here, we study the fine-scale genetic structure of its population, and 
the genetic impacts of historical events, leveraging powerful, 
haplotype-based statistical methods to analyse 1413 individuals from 
across Spain. We detect extensive fine-scale population structure at 
extremely fine scales (below 10 Km) in some regions, including Galicia. 
We identify a major east-west axis of genetic differentiation, and 
evidence of historical north to south population movement. We find 
regionally varying fractions of north-west African ancestry (0–11%) in 
modern-day Iberians, related to an admixture event involving 
European-like and north-west African-like source populations. We date 
this event to 860–1120 CE, implying greater genetic impacts in the early
 half of Muslim rule in Iberia. Together, our results indicate clear 
genetic impacts of population movements associated with both the Muslim 
conquest and the subsequent Reconquista.

Introduction

Genetic differentiation within or between human populations (population structure) has been studied using a variety of approaches over many years1,2,3,4,5. Recently there has been an increasing focus on studying genetic differentiation at fine geographic scales, such as within countries6,7,8. Identifying such structure allows the study of recent population history, and identifies the potential for confounding in association studies, particularly when testing rare, often recently arisen variants9. The Iberian Peninsula is linguistically diverse, has a complex demographic history, and is unusual among European regions in having a centuries-long period of Muslim rule10.

Previous studies of population structure in Spain have examined either a small fraction of the genome11,12,13 or only a few regions of Spain14,15, and typically compare groups of individuals defined a priori using broad ethnic or geographic labels, such as autonomous community. Using such approaches only limited population structure within Iberia has been identified15,16,17,18,19. Some structure within northern Spain has been detected, including statistically significant differences in frequencies of Y-chromosome haplotypes and other genetic markers between the Basque-speaking regions and other parts of Iberia11,12, a result consistent with a European-wide analysis using autosomal DNA20. Studies of Spain that used genome-wide data did not leverage information in correlations between genetic markers14,15, excepting one study21, which detected a cline of variation broadly distinguishing samples in País Vasco from other parts of northern Spain, especially Galicia, but no evidence of sub-structure in central or southern Spain. Thus the overall pattern of population structure within Spain—including subtle structure at fine geographic scales—remains uncharacterized.

The cultural and linguistic impact of Muslim rule in Iberia is well-documented, but the historical record is limited in its ability to inform about the extent, timing and geographic spread of genetic mixing between immigrants and indigenous Iberians over several centuries after the initial conquest22. Previous genetic studies have reported signals of admixture from sub-Saharan Africa and/or north Africa into Iberia at some point in the past23,24,25,26,27. However, estimates of the timing of this admixture vary greatly, from as long as 74 generations ago (~100 BC)23 to 23 generations ago (~1330 CE)25. Estimates of overall mean proportions of African-like DNA in the Iberian Peninsula also vary, ranging from 2.424 to 10.6%11. Differences within Iberia have also been reported11,26, based on comparisons between sampled regions, with higher fractions observed in western regions of Iberia (e.g. 21.7% in Northwest Castile11) and lower fractions in the north-east (e.g. 2.3% in Cataluña11). Estimates of the timing and extent of admixture tend to vary depending on the reference populations assumed to represent the ancestral mixing groups (e.g. Moroccan11 or Saharawi26), as well as heterogeneity in the ancestral make-up of the modern-day Iberian samples used in the analysis.

Here we analyse genome-wide genotyping array data for 1413 Spanish individuals sampled from across Spain. By using powerful, haplotype-based statistical methods we identify extensive fine-scale structure down to scales <10 km in some places. We identify a major axis of genetic differentiation that runs from east to west across Iberia. In contrast, we observe remarkable genetic similarity in the north–south direction, and evidence of historical north–south population movement. Finally, we sought to clarify the timing and composition of African-like and potentially non-African genetic contributions to the Iberian Peninsula, by jointly analysing genotype data sourced from a wide range of African and European regions. We show that modern Spanish people have regionally varying fractions of ancestry from a group most similar to modern north-west Africans. This African ancestry, identified without making particular prior assumptions about source populations, results from an admixture event that we date to 860–1120 CE, corresponding to the early half of Muslim rule. Our results indicate that it is possible to discern clear genetic impacts of the Muslim conquest and population movements associated with the subsequent Reconquista.

Results

Extensive fine-scale population structure in Spain

We analysed phased genotyping array data for 1413 Spanish individuals typed at 693,092 autosomal single nucleotide polymorphisms (SNPs) after quality control (Methods). We applied fineSTRUCTURE28 to these data to infer clusters of individuals with similar patterns of shared ancestry (Methods). fineSTRUCTURE inferred 145 distinct clusters, along with a hierarchical tree describing relationships between the clusters (Fig. 1a; Methods). We used genetic data only in the inference, but explored the relationship between genetic structure and geography using a subset of 726 individuals for whom geographic information was available and all four grandparents were born within 80 km of the centroid of their birthplaces. Figure 1b represents each of these individuals as a point on a map of Spain, located at the centroid of their grandparents’ birthplaces and labelled according to their cluster assignment after combining small clusters at the bottom of the tree (Methods). Their grandparents were likely to have been born in the decades either side of 1900 (median birth-year of the cohort is 1941), so the spatial distribution of genetic structure described in this study would reflect that of Spain around that time.

 

 spanish individuals grouped into clusters using genetic data only. a
 Binary tree showing the inferred hierarchical relationships between 
clusters inferred using genotype data of 1413 individuals (fineSTRUCTURE
 analysis A). The colours and points correspond to the clusters shown on
 the map, and the length of the coloured rectangles is proportional to 
the number of individuals assigned to that cluster. We combined some 
small clusters (Methods) and the thick black branches indicate the 
clades of the tree that we visualise in the map. Clusters are labelled 
according to the approximate location of most of their members, but 
geographic data was not used in the inference. b Each individual (n = 726)
 is represented by a point placed at (or close to, <24 Km) the 
centroid of their grandparents’ birthplaces. We only plot the 
individuals for whom all four grandparents were born within 80 km of 
their average birthplace, although the data for all individuals were 
used in the fineSTRUCTURE inference. The background is coloured 
according to the spatial densities of each cluster at the level of the 
tree where there are 14 clusters (Methods). The colour and symbol of 
each point corresponds to the cluster the individual was assigned to at a
 lower level of the tree, as shown in a. Spain’s autonomous communities are also shown. c
 A representation of changes in the linguistic and political boundaries 
in Iberia from ~930 to 1300 CE, adapted with permission from maps by 
Baldinger29.
 Different linguistic areas are shown with the colours and shading, and 
political boundaries with white borders (in the far right map only). 
Only the colours and labels of the Christian kingdoms have been added to
 aid visualisation. 
These results reveal patterns of rich fine-scale population structure in
 Spain. At the coarsest level of genetic differentiation (i.e. two 
clusters at the top of the hierarchy) individuals located in a small 
region in south-west Galicia are separated from those in the rest of 
Spain. The next level separates individuals located primarily in the 
Basque regions in the north (País Vasco and Navarra) from the rest of 
Spain. Further down the tree (background colours in Fig. 1b)
 many of the clusters closely follow the east–west boundaries of Spain’s
 autonomous communities, especially in the north of Spain. However, in 
the north–south direction several clusters cross boundaries of multiple 
autonomous communities. Overall, the major axis of genetic 
differentiation runs from east to west, while conversely there is 
remarkable genetic similarity on the north–south direction. In a 
complementary analysis that included Portugal, although fewer SNPs 
(Methods), Portuguese individuals co-clustered with individuals in 
Galicia (Fig. 2a),
 showing that this pattern extends across the whole Iberian Peninsula. 
Indeed, rather than mainly reflecting modern-day political boundaries 
(autonomous communities), the broad-scale genetic structure of the 
region is strikingly similar to the linguistic frontiers29 present in the Iberian Peninsula around 1300 CE (Fig. 1c).
 Via more formal simulation-based testing, we confirmed this: the 
association of genetic structure with language is statistically 
significant (p< 0.008), even after accounting for both physical distance and autonomous community membership (Supplementary Note 9; Supplementary Figure 8).
 Conversely, once physical distance and language are taken into account,
 no significant association with autonomous community remains (p = 0.12).
Clustering analysis including Portuguese individuals; and large clusters at the bottom of the tree. a
 This map and tree show clusters inferred by fineSTRUCTURE (analysis B) 
that included data from Portuguese individuals but using a smaller set 
of SNPs (Methods). As in Fig. 1b
 we show the level of tree such that all clusters contain at least 15 
individuals (39 clusters). Points representing 843 individuals are shown
 on this map but, as with analysis A, data for all Portuguese and 
Spanish individuals (1530) were used in the inference. Positions of 
points and background colours are determined using the same procedure as
 for Fig. 1b (Methods),
 with the exception of Portugal. No fine-scale geographic information 
was available for these individuals, so we placed them randomly within 
the boundaries of Portugal and show a single background colour. b
 This map shows geographic spread of the three large clusters that 
remain at the bottom of the tree inferred in the Spain-only 
fineSTRUCTURE analysis (see main text; Fig. 1a).
 These clusters each contain more than 100 individuals out of the full 
set of 1413. The accompanying tree highlights the three clusters within 
the full tree structure. The width of the coloured rectangles is 
proportional to the number of individuals belonging to each cluster 
(yellow = 222; orange = 165; red = 123)
Although some geographically dispersed clusters (e.g. ‘central’ and 
‘west’) remain largely intact at the bottom of the hierarchical tree 
(Fig. 2b)
 many of the clusters that emerge further down the tree involve greater 
geographical localisation. By far the strongest sub-structure is seen 
within a single province in Galicia, Pontevedra, which contains almost 
half of the inferred clusters in all of Spain (Fig. 1a).
 This ultra-fine structure is seen across scales of <10 km and the 
clusters align with regions defined by hills and/or river valleys (Fig. 3a).
 This structure is not an artefact of the denser sampling in this 
region, as it was still evident in an analysis after sub-sampling