Detecting Remote Historical Signal

Recent years have seen highly successful attempts to

adapt phylogenetic computational models to track language relatedness, date family trees of languages, and determine their geographical location at different nodes of the tree.

These methods have primarily used lexical data, on the assumption that changes in basic vocabulary occurring over time can reveal historical signal. A persistent worldwide problem in using linguistic data to build models of the deep human past has been the ‘time barrier’, with signals of relatedness between languages usually stated to fade beyond detection after around 10,000 years. This contrast with archaeology, where the 40kya barrier associated with radiocarbon dating was pushed back once new techniques like thermoluminescence dating and molecular genetics were developed. These made it possible to build phylogenies going back to the earliest humans and beyond.

An alternative approach is to infer phylogenies and contact from a wider range of language features. One of the most promising advances for detecting deep signal has been development over the last five years of the Glottobank Consortium under the triple auspices of Centre of Excellence for the Dynamics of Language at the Australian National University, the Department of Linguistic and Cultural Evolution at the Max Planck Institute for the Science of Human History in Jena, and the University of Auckland. Classic methods, which look for relatedness based on regular sound-relationships between the forms of words and their parts (e.g. English thou Sanskrit tvám, English we Sanskrit vayám, English us Sanskrit asmān, English you and Sanskrit yúyām, etc). But these similarities eventually get worn away by the effects of change – thou has effectively disappeared from modern English, and the Hindi words are nowhere as clearly related as the Sanskrit ones are.

Glottobank adopts a radically different approach, by compiling vast databases of linguistic structures likely to be found in all languages, and therefore not prone to loss in the same way. So far it has amassed data for over 1,000 of the world’s 7,000 languages, and the steady work building this over the last five years will soon move to the next phase, of developing methods for analysing this information to answer questions about the languages both of Oceania and of other parts of the world. One part of Glottobank, namely Parabank, contains information about paradigms – organised multidimensional patternings of words – and a key set of paradigms are kin terms. So the data here are useful not just for historical linguistic analysis but also for plotting kinship systems – with the implications these have for social structure – across the many hundreds of cultures for which we have descriptions; this will interface closely with methods §1 and §2 above. The next step in our analysis is to link in cognate sets across the 1,000+ individual languages represented in Parabank. to help us (i) develop a typology of pattern-change, (ii) to understand the effects of contact (both direct borrowing of forms, and borrowing of patterns) and (iii) harness this new approach to model deep-time patterns of historical connection and change. For example, initial work comparing the patterning of pronoun paradigms across the Torres Strait gives a very neat cleavage first at hemi-continental level and then by phylogenetic groupings on either side, but modulated by tantalising contact effects between the Western Torres Strait language and Kiwai.

Through analyses like this we plan to develop methods of detecting historical signal in compact but highly-organised parts of the semantic system, in ways that transcend the limitations of traditional methods and that generalise to the relations between ‘maximal clades’ – the largest groupings reconstructable by traditional methods – which is where the linguistic trail runs cold at present.

Research theme lead: Nick Evans