1. Introduction: Eastern Origins of English

1.1 The Narrow Land

English, a language spoken in England, gets its name from the Angles, a Germanic people who began settling in Britain in around the 5th century. They came from the Angeln region of south-east Jutland, in northern Germany just south of the present-day border with Denmark. To the north of them in Jutland, in present-day Denmark, lived the Jutes, and to the south of them, in Saxony (present-day North-West Germany), the Saxons.

Angles, Saxons and Jutes spoke different but similar and closely related dialects of the same language. The Saxons and Jutes may have called their speech Saxon and Jutish, but we now label Anglian, Saxon and Jutish all as “Old English” or ”Anglo-Saxon”.

Following their migrations to Britain, the Anglian dialect of Old English was spoken in the Midlands (the kingdom of Mercia), in the North of England/southern Scotland (the kingdom of Northumbria), and in East Anglia. The Saxon dialect was spoken in the southern and western kingdoms and regions of Essex, Wessex, Middlesex/Surrey, and Sussex, and the Jutish dialect was spoken in Kent and the Isle of Wight.

Nowadays, the first vowel of English is usually pronounced like the second, as if spelled Inglish [ɪŋɡlɪʃ]; the Old English spelling englisc was pronounced [eŋɡlɪʃ], more like the present-day German pronunciation Englisch.

The fact that Englisc is also spelled as Ænglisc or Anglisc, the people as Engel, Ængle or Engle, and the lands with which they are associated, both in Germany (Angeln) and Britain (e.g. East Anglia), shows that the initial [e] of Englisc was previous pronounced with a more open vowel, [æ] or [a].

The name of their Germanic homeland, Angeln, probably originates from a Germanic root *ang- meaning “narrow”, i.e. “the narrow land”, perhaps relating to its lying on a kind of peninsula (Figure 1).

Hel-hama, CC BY-SA 3.0 <https://creativecommons.org/licenses/by-sa/3.0>, via Wikimedia Commons

Figure 1. Location of the Angeln region in southern Denmark. By Hel-hama - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=20126198 

The Old English words enge and ang-nægl (“ang-nail”, modern English agnail or hangnail), Middle English ange, modern German and Dutch eng and similar words in other modern Germanic dialects, all mean “narrow”; they derive from Proto-Germanic *angwu or *angu. We can infer that the pronunciation of the eng- part of Old English enge was the same as or very similar to the modern German or Dutch pronunciations [] and [].


1.2 Ancestors of *ang

Related words in other Indo-European languages, such as Ancient Greek ἄγχω [aŋkho:], Latin angō, and Sanskrit अंहु aṃhú-, all with meanings such as “narrow”, “tight” or “constrained”, have led philologists to reconstruct a Proto-Indo-European ancestral root *hemǵʰ-u-, pronounced something like [aŋg̟ʱu], or at an earlier stage perhaps [ħaŋg̟ʱu]. (See section 4.6 for extended discussion of the possible pronunciations of *h in Proto-Indo-European.)

In some branches of Indo-European, this root turns up in rather concealed guises, e.g. in Slavic languages such as Russian узкий [úzki:] or Bosnian, Croatian, Montenegrin and Serbian uzak. How did [aŋg̟ʱ-] become [uz-] in those languages? Note first of all that the consonant [g̟ʱ] is an advanced or fronted velar plosive, that is a kind of [g] like that found in English before the vowel [i], as at the start of the word gear [g̟i‒]. In a large group of languages including Slavic, Baltic, and Indo-Iranian languages, this [g̟] became first a palatal sound, eventually becoming like the j ([ʤ]) in judge [ʤʌʤ], then in some words that consonant became a [ʒ] as in pleasure (i.e. plea[ʒ]ure) and eventually a [z]. Such a sequence of sound changes is quite similar to that seen in some “English” words (adapted from Classical Greek, some time ago), e.g. analogue (which ends in [g]) + suffix -y [g̟i] → analogy, now pronounced with final syllable [ʤi], not [gi]. Compare also the French pronunciation of analogie, with final [-ʒi]. Or oesophagus (with [g] near the end) → oesophageal, i.e. oesopha[ʤi]al.

A change from [ʒ] to [z] is not seen directly in English phonetics, though it is found in other languages. However, the reverse development, from [zi] to [ʒ], can be heard in e.g. vision pronounced vi[ʒ]on, though the spelling indicates that is was previously pronounced vi[zi]on.

Sound changes parallel to [g̟i] > [ʤi] > [ʒi] > [zi] are also seen with the voiceless consonant [k], i.e. the historical palatalization of [k̟i] to [ʧi], thence > [ʃi] > [si], as in: electric (ends in [k]) + suffix -ian electrician, pronounced electri[ʃ]ian not electri[k]ian; electri[k] + suffix -ity electricity, i.e. electr[s]ity not electri[k]ity. For further discussion and more examples of such changes, see section 4.4.1 Satemization. Proto-Indo-European *h₂emǵʰ-u- developed into Proto-Balto-Slavic *anźu-, through palatalization and softening, ǵ > z, as described above.

So much for how the Proto-Indo-European [g̟ʱ] in [aŋg̟ʱ-] became [z] in Slavic languages. Now, how did the inital vowel [a] become [u] in e.g. uzak? The combination of [a]+nasal consonant developed into a nasalized vowel [ã], a process that is common across Indo-European languages and indeed in many other languages of the world. This nasalized vowel [ã] was pronounced like the Modern French word an (year). In French, it is only a fractionally small change in pronunciation from an (year) to on (the pronoun “one”), [õ], involving only a slight rounding of the lips. 

In Old Church Slavonic (Old Cyrillic script), the word for “narrow” is ѫзъкъ, usually transcribed by historical Slavicists as ǫzŭkŭ; in the International Phonetic Alphabet that's [õzɵkɵ], the second and third vowels very short and fleeting. That initial [õzɵ] is pretty much the same pronunciation as French onze “eleven” — completely unrelated in meaning, but used in my audio simulations of sound change, just for its pronunciation.

The initial vowel lost its nasality in most Slavic languages; for example, the Modern Slovenian form of this word is ozek [ozək].  Exceptionally, Polish retains some Proto-Slavic nasal consonants, though the initial vowel in its form of this word, wąski [vɑ̃ski] is unrounded and has developed a preceding [v]. In Russian, Bulgarian, Czech, Slovak and Bosnian-Serbian-Croatian, however, as well as losing its nasaility, the old initial [õ] has been raised a little, to become [u]. A comparison of Slovenian ozek [ozək] with Bosnian-Croatian-Montenegrin-Serbian uzak [uzak] reveals the nature of that change very well.

The pursuit of relatives and ancestors of English constantly takes us eastwards: from England back to Angeln, from Northern Europe to Central and South-west Europe. For many decades, the consensus among historical linguists and linguistic archaeologists has been that the oldest traceable ancestor of all the Indo-European languages, Proto-Indo-European, probably developed in a region north of the Black Sea, in modern-day Ukraine and neighbouring regions in South-West Russia. In recent years, arguments for Indo-European origins lying further east and south of those regions have drawn strength from new research into population movements traced through the analysis of ancient DNA (cite Southern Arc paper), a proposal that draws support from parallel new work using computational modelling of Proto-Indo-European phylogeny (Heggarty et al. 2023).


1.3 Further eastern relatives

The Islamic Republic of Iran is an oil-producing Middle Eastern country with a majority muslim population. Its main language, Persian (Farsi), is written using the Arabic alphabet, as are the languages of its regional neighbours in Iraq, Afghanistan, Pakistan and the Gulf states. But though Persian has borrowed many words from Arabic, it is not linguistically related to Arabic: rather, it is an Indo-European language related to Greek, Latin, Sanskrit, Russian, English and most European languages, as well as Pashto, Balochi and others in Afghanistan, Urdu in Pakistan and Hindi in India. Consequently, many Persian words seem somewhat familiar to European ears, for example tondar (thunder), sitare (star), mush (mouse), abar (over), bu (be), yog (yoke), naf (navel), lis (lick), vakhsh (wax), and dukhter (daughter). The relatedness of many other Persian words to English is not always obvious, though philologists have traced the links in two centuries of detailed research into the etymologies of the Indo-European languages; for example, Persian pir (fire) is related to Greek pyro-, and balesh (cushion) is related to English belly, bulge, and bag. Borj (tower) is related, via the meaning “high place”, to “raised earthwork” terms such as borough, bury, (long) barrow, as well as German Berg (mountain). The relationship of Persian deraz to English long is fairly opaque, but can be traced by a chain of small changes in pronunciation: long comes from Anglo-Saxon and Proto-Germanic lang, which philologists trace back to *dlonǵʰ in the reconstructed Proto-Indo-European ancestor language. In the Indo-Iranian branch of our linguistic family tree, dlong became derang in Middle Persian and eventually deraz in Modern Persian.

In recent decades, a fruitful convergence of work in historical linguistics, archaeology, genetic studies of Eurasian population movements in prehistory, and computational phylogenetic modelling of languages has yielded significant advances and refinement in our understanding of where the speakers of Proto-Indo-European came from. Although there remain many uncertainties and grounds for scholarly debate (I do not aim to settle them), a current consensus is that the Proto-Indo-European ancestor of modern Indo-European languages was spoken by horse-rearing nomadic pastoralists on what is now the Ukrainian steppe about 6000 years ago (Mallory 1989, Anthony 2007, Olander 2019). Various populations migrated away from there in different directions and times, giving rise to the diversification and wide geographical spread of modern Indo-European languages (Figure 2). While speakers of what later became Germanic languages spread westwards and northwards, speakers of dialects that became Iranian and Indo-Aryan languages spread further east and south; the speech of populations migrating northwards developed into modern Baltic and Slavic languages (see Figure 2). The long time-depth and wide geographical dispersal of the various branches of Indo-European mean that the phonological relationships between modern forms of words in different languages are often rather opaque, but two centuries of philological scholarship have yielded detailed hypotheses about the ancestral forms of words, which vowels, consonants and accents it seems necessary to postulate, as well as how the ancestral words were altered by chains of successive modifications to yield present-day pronunciations.



Indo-European

Figure 2. The spread of Indo-European languages from a supposed origin in the Southern Ukrainian and Russian steppe region, c. 4000 BC, moving outwards to the various locations in which modern descendants are found, using cognate forms of five as a representative example. Click on the text or arrows to hear how words for “five” derive from Proto-Indo-European *pénkʷe or its derived form *pnkʷt. Key: black ‒ Proto-Indo-European; dark green ‒ Slavic languages e.g. Polish, Ukrainian, Bosnian; light green ‒ Baltic languages e.g. Latvian and Lithuanian; red ‒ Germanic languages e.g. English, Low Saxon German; dark blue ‒ Italic languages e.g. Oscan, Latin and its descendants; orange ‒ Celtic languages e.g. Irish and Welsh; pale blue ‒ Albanian; yellow ‒ Greek; brown ‒ Indo-Iranian languages e.g. Ossetian, Kurdish, Persian, Tajik, Pashto, Balochi, Punjabi, Sanskrit, Urdu-Hindi, Assamese, Sylheti, Sinhalese. Base map: https://commons.wikimedia.org/wiki/File:Indo-European_branches_map.png


Over more than a decade, I have developed computational methods for simulating such histories of changing sounds using audio recordings and a career-long experience of research in methods of acoustic analysis and speech synthesis. In the work presented on the Audio Etymological Lexicon (chapter 2 of this hypertext book), I employ those methods to try to make an ambitious contribution to historical phonology, by exploring how prior findings and methods of that field can be completely re-cast or reimagined in acoustic-phonetic terms. For example, as mentioned above, at http://www.ancientsounds.net/#long you may hear my simulation of how the Modern English pronunciation long can be morphed backwards in time to Proto-Indo-European *dlonǵʰ‑; from there the simulation travels forwards in time to Persian deraz, making the theoretical propositions of the philologists immediately audible for anyone to hear. In a similar way, the Persian word shir (tiger), as in the name of the Jungle Book character Shere Khan, is related to English fierce and feral through our shared Proto-Indo-European ancestral pronunciation *ǵʰwēr‑; Mandarin Chinese 狮子 shizi (lion) is also related, as a borrowing from Persian. And the clickable map in Figure 2 illustrates many paths of sound-change by which words for “five” in many modern Indo-European languages derive from Proto-Indo-European *pénkʷe or its derived form *pnkʷt.

The selection of words chosen illustrate a variety of well-studied processes of sound-change, such as the Great Vowel Shift and Grimm's Law in the history of English and Germanic, and “Satemization” in the history of Iranian, Sanskrit and Balto-Slavic languages, topics that are discussed in considerably greater detail in chapter 4.

All entries in chapter 2, the Audio Etymological Lexicon, have been tagged by their English word, so that in order to jump to any particular headword it is possible to use the URL http://www.ancientsounds.net/#<headword>, such as http://www.ancientsounds.net/#aghast

For an alternative presentation of that lexicon, Chapter 3 gives an alphabetically ordered list of microblog posts about each word, aimed at a general readership. Some of these entries reproduce some comments from public commentators on my original posts. Chapter 4 is a more technically, academic discussion of the phonetics and phonology of Proto-Indo-European as exemplified through the resources presented in the the Audio Etymological Lexicon, and chapter 5 is a brief explanation of the methods used to generated the simulations of sound changes and pronunciations from the past.


Sounds in hypertext

As may be already apparent from this page, words or phrases given in blue are clickable links to audio simulations of various kinds; clicking on those links will play the audio without opening a different tab or webpage. As conventional in websites, words or phrases that are blue and underlined are links to other webpages or files; clicking on them may open a new tab or initiate the downloading of a sound file to your browser. Given the low quality of audio reproduction on many devices, I recommend wearing headphones of almost any kind for an improved audio experience with these audio materials.


1.4. Acknowledgements and permissions

Over 3600 sound files are presented in this site, simulations of present-day and ancient pronunciations of words in many languages. Most of them are not original sound recordings as such, but are digitally processed and in many cases entirely synthetic simulations of pronunciations, created using speech technology and audio processing software. They are, however, based upon an extensive library of recordings of many languages that I and my research assistants and language consultants have collected over several decades; as this collection is not sufficient to provide examples of all the words in the all the languages in the scope of this project, it also uses recordings published openly on the public internet, in e.g. online language lessons, online learner's dictionaries, academic collections such as dialect corpora, and online games. While much of this material is permitted to be used for academic research and educational use under copyright rules, it is often not permitted nor would it be reasonable to simply re-publish examples obtained from third parties. However, the nontrivial audio processing that my methods employ mean that the material produced in the simulations given here are “derived works”, offered here to be used or re-used for non-commercial purposes according to the terms of the Creative Commons CC BY-NC-SA 4.0 licence, without any need to ask for specific permission.

Special thanks are owed to many speakers whose voices were used in these simulations and their underlying recordings: to Ranjan Sen for English and Latin; Adam Sheingate and Karen Park for some of the American English recordings; Laura Ashe for her expert pronunciations of Anglo-Saxon/Old English; Suhas Mahesh for most of the spoken Sanskrit recordings; Mary Baltazani for fieldwork recordings of many speakers of Greek; Ali Hussain for recordings of several speakers of Siraiki, Balochi, and Pashto, with additional Pashto material from https://dsal.uchicago.edu/dictionaries/heston/; and to Dominyka Verikaitė for Lithuanian. Simulations in other languages are based upon recordings in openly available publications on forvo.com, dict.cc, glosbe.com and Wikimedia Commons.

Previous: Table of Contents
Next: Audio Etymological Lexicon