Languages change over the centuries — the Old English (or “Anglo-Saxon”) out of which modern English has evolved over the past millennium is recognizably related to present-day English, but it is so different that if people were still speaking it somewhere we would certainly count it as a separate language. We could not understand them without a course of language lessons.
Old English takes us a little over one thousand years back, and it is the earliest ancestor-language of modern English that had a written form. If we are willing to accept partial information, though, we can get far further back than that.
This is because languages belong to families. After a language has spread over a sizeable territory, particularly in pre-modern conditions in which travel and communication are limited, the largely random changes which happen to all languages everywhere will be different from area to area. After a time, what began as local dialects will diverge into separate languages. So, for instance, French, Italian, Spanish, Portuguese, and Rumanian all trace their ancestry back to Latin. This is a special case, because the mother language was already a written language associated with a high civilization: consequently, many of us spent years of our lives learning it at school. In other cases, the mother language was not a written language, and we do not know what its speakers called it (if indeed they had a clear concept of their language as a namable thing alongside other languages); but we can form a fairly clear picture of the mother language by comparing the earliest recorded forms of the daughter languages.
So, for instance, English, together with Dutch, German, Danish, Norwegian, Swedish, and Icelandic, go back to a language which we now call “Proto-Germanic”, which was a living spoken language about the same time as Latin — broadly two thousand years ago — but which, unlike Latin, was not written.
(Of course, a language can influence another language without being its ancestor. English does not descend from Latin, but we have a huge number of words in English that derive from Latin: that is because, until very recently, learning Latin was a basic part of a European’s education, so when new words needed to be coined it was natural for educated people to reach back to Latin as a source. Also, in the year 1066, England was conquered by French-speaking Normans, and for several centuries their dialect of French became the language of government in England; consequently many words flowed from French into English. But there is no difficulty in separating out the layers of Latin- and French-derived vocabulary, which are borrowings, from the “native” English vocabulary, which goes back to Old English and through that to Proto-Germanic.)
The several thousand languages now spoken in the world comprise many different families. Most linguists believe that language did not originally arise in just one place but independently among different human groups — or at least, that if all present-day languages do ultimately derive from a single shared ancestor, that ancestor language must have lain so far back in the past, and the various descendant languages must have changed so massively, that data available to us now could never show such relationships. English, Chinese, and Swahili, to take three examples, look entirely different from one another (except for borrowings that occurred in historical times — tea certainly comes from Chinese, because the word came to Europe with the thing). So far as anyone knows or is ever likely to know, these languages are indeed unrelated at even the remotest level.
But English belongs to a family much wider than just the Germanic languages. Germanic — the set of languages descended from Proto-Germanic — is one branch of the “Indo-European” language family. “Italic”, which covers Latin and its descendants together with a few obscure related languages of ancient Italy, is another branch (the modern languages descended from Latin are called the “Romance” languages). Altogether, the Indo-European family has about twelve main branches at this level, Germanic and Italic being two of these. The family includes almost all languages of Europe (Hungarian is one of the exceptions), together with various languages of Western Asia and Northern India. About half of the world’s present population speak some IE (Indo-European) language as their mother tongue. All these languages are held to derive ultimately from a single ancestor language, which we call “Proto-Indo-European”, or PIE for short.
In a sense, this is not a mere hypothesis. Saying that various languages are related to one another means that they share a common ancestor language. Anyone who doubted that PIE really existed would be saying that some modern languages which we take to be related are not genuinely related languages. But we think the modern data show that all these languages, including for instance Albanian, Persian, and Hindi, definitely are related languages, so we believe that there must have been a PIE language, once.
Although PIE is very remote from us in time, and there are obviously nothing like written records of even the most fragmentary kind, linguists working over the past two hundred years, since these issues were first recognized, have reconstructed quite a lot of facts about what the language was like. They have triangulated from the various descendant languages. Broadly, the logic runs: “If such-and-such aspect of language structure is like this in subfamily A, like that in subfamily B, and like that in C, what single kind of earlier structure could plausibly have developed in each of those directions?” Many aspects of the reconstruction are tentative or controversial — that’s science for you; but the reconstruction is reasonably solid, it is certainly much more than just a collection of speculations and wild stabs in the dark.
In the past, this material has been scattered in obscure books and academic journal articles. It has recently been assembled in one place, namely a large and excellent book edited by J.P. Mallory and D.Q. Adams, Encyclopedia of Indo-European Culture, Fitzroy Dearborn Publishers, London and Chicago, 1997. (NB that the spelling “Encyclopedia” is not an error, Americans spell it that way.) My account here will draw heavily on Mallory and Adams’s compilation. Another, introductory book is Robert Beekes, Comparative Indo-European Linguistics: An Introduction, John Benjamins, Amsterdam, 1995 (originally written in Dutch, and not terribly well translated, unfortunately), and I shall draw on Beekes’s book also. Beekes takes a different line from Mallory and Adams on some issues, but I shan’t get into that level of detail here.
The best guess at when PIE was spoken puts it at something like six thousand years ago, give or take a millennium or so. There is much controversy about where it was spoken. For a long time the most usual answer was the Southern Russia/Ukraine region, but nowadays this is just one theory among others.
(In recent years there has been much popular interest in an idea put forward in the 1987 book Archaeology and Language: the Puzzle of Indo-European Origins, by Colin Renfrew – now Lord Renfrew, an eminent but linguistically rather naive archaeologist – according to which the Indo-European languages originated in Anatolia. The kernel of truth here is that, well after Indo-European was recognized as a language family, it was discovered that certain long-dead languages of Anatolia, including Hittite and Luwian, were themselves related to the previously-identified IE languages, but appeared to have split off from the rest of the IE stock before the latter diverged into subfamilies such as Germanic, Italic, Celtic, and so forth. Renfrew believes that agriculture first spread into Europe from Anatolia, and he argues that the IE languages arrived along with the practice of farming. On the origins of European agriculture Renfrew may be correct – I am not qualified to judge; and for all I know he may be right to surmise that the earliest IE languages spoken on European territory came from Anatolia. But the languages spoken in Europe in the historical period do not descend from Anatolian languages, which became extinct without descendants long ago. As for instance Margalit Finkelberg has pointed out, in Greeks and Pre-Greeks, 2005, p. 53n, the Anatolian languages as a group are so different from the other IE languages that the two groups must have developed separately over a long period. Thus, wherever the languages now spoken in Europe originated, it seems to have been somewhere far away from Anatolia.)
It is important to grasp that PIE is not anything like “the first human language”, or even “the original ancestor of our languages”. Language, in Europe and everywhere else in the world, undoubtedly has existed far, far longer than just six thousand-odd years. PIE is simply the earliest language it is possible to reconstruct from the evidence of modern and recent IE languages. If the entire linguistic family tree of which it is a part could be spread out to view, PIE would be one node near the bottom of the entire tree, distinguished by the fact that many of the branches it dominates reach right down to the present day. There would be masses of branching higher up in the tree, but the branches which did not lead down to the PIE node would eventually end in twigs suspended in mid-air — representing languages which became extinct too long ago for us to know anything about them.
Nevertheless, PIE is sufficiently old that it may possibly have had properties that would make it seem not just “different” but somewhat “primitive”, if we could encounter it as an actual spoken language today. Nobody would expect PIE to have had words for “television” or “banana” — obviously. But, more interestingly, Mallory and Adams point out for instance that the PIE word for “nine” seems to derive from the word for “new”; they suggest that “nine” may originally have been called “the new number”, implying that having a name for such a big number ranked for PIE speakers as a whizzy technological breakthrough. (In English, the pronunciation of these two words has developed rather differently, but notice that in German neun and neu are closer, and in French neuf has both meanings.)
For a long time, it has been suspected that PIE may have been structurally simple, relative to present-day languages, in ways that go deeper than lack of particular vocabulary items. More than a hundred years ago, Eduard Hermann argued that PIE may have had no complex sentences: all utterances would have been strings of simple clauses, with no clause subordination. Instead of saying things like “When he saw the stone he wanted, he shouted out”, PIE speakers might have said things more like “He saw a stone. He wanted that stone. Then he shouted out.”
In the closing decades of the 20th century, this and similar ideas were widely rejected, not so much because of factual evidence but for ideological reasons. Many linguists wanted to think of all human languages as equal. They disliked the suggestion that languages could be ranked as more or less evolved.
However, a careful, scholarly book by Guy Deutscher (Syntactic Change in Akkadian, Oxford University Press, 2000) has now shown that this principle of linguistic equality is not really tenable. The most ancient languages which were recorded in writing had very limited systems of grammatical subordination; some languages spoken by simpler, tribal societies today demonstrably are less evolved than modern European languages in this respect. So it does seem quite possible that Hermann’s suggestion about PIE may have been correct.
The best way to show what PIE was like is to say something in it. The language is reconstructed well enough that scholars have felt reasonably confident in assembling little specimens of PIE prose. One such specimen is based on a short extract from Old Indic literature. This is material that was transmitted from generation to generation by word of mouth before first being written down, and it may represent the earliest genre of literary composition recorded in any IE language. S.K. Sen picked a simple passage in which the Old Indic vocabulary is known to correspond to PIE roots rather than later neologisms, and took a consensus view from the experts on what the passage would look like if the Old Indic structures and sounds were rolled back two or three further millennia to their PIE antecedents. Here is the passage in English translation:
Once there was a king. He was childless. The king wanted a son.
He asked his priest: “May a son be born to me!”
The priest said to the king: “Pray to the god Varuna”.
The king approached the god Varuna to pray now to the god.
“Hear me, father Varuna!”
The god Varuna came down from heaven.
“What do you want?” “I want a son.”
“Let this be so”, said the bright god Varuna.
The king’s lady bore a son.
When Mallory and Adams print the PIE version offered by S.K. Sen, E.P. Hamp, and others, they use the spelling system traditional among Indo-Europeanists for representing PIE sounds. This involves many letters which contain accents and other diacritic marks, sometimes two on one letter. It is impossible to display these in an HTML Web page (even in the English translation above, I had to miss out the dot which should appear under the “n” of the name Varuna, as transcribed from Sanskrit). But in any case, this traditional spelling system looks peculiar and offputting to English-speaking readers, and it is more traditional than exact — some aspects of it represent ideas which 19th-century researchers had about PIE that everyone now agrees were probably mistaken.
I have preferred to use different spelling conventions for PIE, which capture the same information about pronunciation, but represent it using ordinary letters and letter-combinations that look as “normal” as possible. They cannot look very normal; this was a language extremely different from English and from the languages that English has borrowed vocabulary from, and it had sounds that in some cases were quite unlike the sounds we are familiar with.
I shall explain the spelling system used here shortly. But first, here is the passage about the childless king, as it might have been uttered by a PIE speaker — by one of our linguistic ancestors, some six millennia ago:
To réecs éhest. So nputlos éhest. So réecs súhnum éwelt.
Só tóso cceutérm prcscet: “Súhnus moi jnhyotaam!”
So cceutéer tom réejm éweuqet: “Ihgeswo deiwóm Wérunom”.
So réecs deiwóm Werunom húpo-sesore nu deiwóm ihgeto.
“Cluttí moi, phter Werune!”
Deiwós Wérunos kmta diwós égweht.
“Qíd welsi?” “Wélmi súhnum.”
“Tód héstu”, wéuqet loukós deiwos Werunos.
Reejós pótnih súhnum gegonhe.
(For four of these words, some of the scholars consulted included an extra sound, because of — for instance — different ideas about which verbal inflexion would have been used in a particular context. I certainly am not qualified to adjudicate such issues, so I have arbitrarily chosen the shortest alternative in each case.)
The word for “priest”, cceutéer, literally meant “pourer”; speakers of the early IE languages seem to have seen priests as men who poured libations to the gods.
Although this passage looks very queer at first sight, if you know a few bits and pieces of recent European languages you can quickly make links with some of the words. The second word, réecs for “king”, for instance, is identical to the Latin word for king, rex. Latin happened to write the combination of sounds cs with a single letter X, as we do in English, but that is just a convention of writing (a rather irrational one). The PIE word has a double E, meaning that the vowel is long rather than short; Latin spelling did not distinguish long from short vowels, but spoken Latin did, and rex had a long E, not a short E. So far as we can tell, this particular word did not change at all over the thousands of years that separate PIE and Latin.
In the third sentence, súhnum for “son” shows a resemblance with English — and not with Latin, where the equivalent word, filium, is based on a quite different root. (Note that in PIE the verb usually came at the end, so in word-for-word translation this sentence runs “The king son wanted”.) In this case it is believed that the Germanic branch of the IE language family preserved the original PIE word, while the Latin branch happened to replace it with a different word (Latin filius, filia for “son”, “daughter” may possibly derive from a root meaning “suck”, offspring being thought of as sucklings). On the other hand, the relationship with Latin comes out in the variation of endings between súhnum in “(wants) a son”, object, and súhnus in the next line, “(may) a son (be born)”, subject — in Latin these words would be filium, filius, respectively.
What about the peculiar-looking kmta, “down”, in the sixth line?
In the first place, despite appearances kmta is a two-syllable word. In PIE, consonants such as m could act as what phoneticians call “syllabic” consonants: they could function as the vowel of a syllable even though they are really consonant sounds. We have syllabic consonants in English; spelled phonetically, a word like “fathom” would be “fathm”, the second syllable has no true vowel and the “m” sound acts as if it were a vowel. PIE had a lot of these syllabic consonants. In the first line, nputlos “childless” is a three-syllable word whose first syllable is a syllabic n-. In the second line, prcscet “he asked” is a disyllable in which the first syllable has a syllabic r.
The two-syllable word kmta corresponds to a word of Greek which is to some extent familiar to all of us, even if we have not studied Greek. A “catastrophe” was in Greek a “turning-down”, “catarrh” was a “down-flow”: the word kata was the Greek for “down”. The Greek language had no syllabic consonants; its syllables all contained true vowels, because sound-changes on the road from PIE to Ancient Greek turned the various syllabic consonants into vowels or vowel-consonant combinations. The suggestion is that in Greek kata, the first A is a vowel which used to be a consonant, while the second A is a vowel that always was a vowel. By the time the Greeks got hold of the alphabet, of course, they had no knowledge that long before their time the two A sounds had been different, so they wrote them the same.
Now to explain the PIE spelling in a bit more detail.
The first point to note is that reconstructions of PIE sounds are sometimes more like abstract tokens than concrete descriptions of pronunciation. That is: if three PIE words are each reconstructed as beginning with t-, say, then this means we are fairly sure they all began with the same sound as one another; but we cannot be sure exactly how that sound was pronounced. In most cases we do know something, though. If the sound is spelled with t-, we are rather sure that the PIE sound was much more like an English T than an English F, say.
The weakest aspect of the reconstructed pronunciation has to do with the sounds I have transcribed with the letter H. PIE is believed to have had four different sounds called “laryngeals” — sounds made somewhere in the back of the mouth or throat. One of these may have been like an English H. Some of the others may have been like the sounds of modern Arabic which are transcribed into our alphabet using apostrophes, reversed apostrophes, or the figure 9. We have hardly any clues about the precise phonetic value of the laryngeals, because they dropped out long ago from almost all the recent IE languages — they are inferred largely from the effect they had on neighbouring vowels. (When h occurs in Germanic words, it derives from a k-like sound in PIE, not from any of these laryngeals.) When we know so little about them, there seems no point in representing the laryngeals differently, so I have simply written them all alike as H. Even the laryngeals could be “syllabic”, as in phter “father!”. (One guess at what a syllabic laryngeal sounded like is the obscure vowel or “shwa” which British English speakers write as er and Americans write uh.)
Double vowels, as we have seen, represent long vowels. PIE had separate long and short vowels: ii versus i, aa versus a, and so on. Thus, i and ii may have been roughly similar to the vowels of English “sit” and “seat” respectively.
In English, so-called “stop consonants” have a contrast between two “manners of articulation”, which phoneticians call “voiceless” and “voiced”: k : g, t : d, and so on. PIE stops had a three-way contrast, which I am showing as t : d : tt, p : b : pp, and so on. We do not know just how these three types of stop were made. One theory is that the sounds written with English voiced letters, such as d and b, were in PIE not voiced consonants but “ejectives”, produced with a closed glottis; and that tt, pp, and so on were “aspirated stops” said with a puff of breath following the consonant release. That would give a perfectly plausible three-way system — modern Korean is rather like that. But there are many other possibilities. In traditional Indo-European studies the sounds I am writing as tt, pp, etc. were written dh, bh, and were taken to be “voiced aspirates”, such as occur in some current Indian languages. All we really know is that PIE triples such as t, d, tt represent three different “manners of articulation”, but we don’t know just what they were.
In the case of t, d, p, b we are fairly sure that the “place of articulation” was similar to the English sounds written with those letters. But in the k/g area, PIE had distinctions that we have not got in English. I am arbitrarily using the letters c, k, q to represent three varieties of k-like sound. The sound written q, we are fairly sure, was like “kw” run together as a single sound — in PIE q was different from cw as a sequence of two sounds. It may be that c was the English c/k sound, while k was a sound like the Arabic “uvular” stop that occurs at the beginning of a name like (the Sheikhdom of) Qatar. Or (this was the traditional Indo-Europeanists’ view) k may have been the ordinary English sound, while c represented “ky” run together as a single sound.
Whatever precise sounds are represented by c k q, each of these also enters into a three-way contrast parallel to t d tt. Thus there are sounds spelled cc kk qq; and there are also three sounds in the k/g area with the same manner of articulation as b and d (ejectives, if that is what these were). I use g for the “ejective” equivalent of k, but since our alphabet has not got three G-like letters parallel to its three K-like letters I have to use makeshifts for the “ejective” equivalents of c and q. I spell the former as j — that is, if c sounded like “ky” run together as one sound, then j will have sounded like “gy” run together in the same way. The letter j is perhaps not a very happy choice here, but in one way it is appropriate: in some of the descendant languages, this PIE sound turned into an English “j” sound — the word for “king” in inflected forms has the root réej- (cf. the third line above), which is why Indian kings were rajahs. For the “ejective” equivalent of q the best I can do is gw, though this is ambiguous: there was a difference in PIE between gw as a single sound, and g followed by w.
In case you are beginning to think that reconstructed PIE had a suspiciously large range of different sounds, notice that while there are many different stops, in some other areas there are fewer sounds than a typical modern European language possesses. Most present-day European languages have a series of “fricative” sounds — English s, th, f, v, and so on. PIE is believed to have had just the one, s. On a world scale, the sound pattern reconstructed for PIE does not look unusually complex or implausible. It is not very similar to patterns found in present-day European languages, but then these are fairly different from one another (for instance, French has a range of nasal vowels, English has none). Anyway, would it not be surprising if there had been no dramatic changes over a period of six millennia?
Letters that I have not mentioned were pronounced more or less as you would guess. The acute mark on some syllables shows that those syllables carried an accent of some kind — possibly, they were said on a relatively high pitch.
The passage shown above is not the only attempt that has been made to reconstruct a piece of PIE as she was spoke. The classic attempt at such an exercise was done in 1868 by the German linguist August Schleicher. Schleicher was the first man to produce a family-tree diagram for the interrelationships among the IE subfamilies (people disagree with details of his tree structure nowadays, but it seems to have been broadly along the right lines). He was also renowned for bringing linguistics into relationship with Darwin’s ideas in biology. The concept of diversity of modern species resulting from descent with gradual modifications from a common ancestor had come to the fore rather earlier in language studies than in biology. When Schleicher read Darwin’s Origin of Species he became excited by the parallels; Schleicher argued that the two domains were closer to one another than one might suppose. He urged that languages should be seen as true living organisms alongside plants and animals, not just metaphorically but in sober reality. In the event this idea did not survive, but it was the kind of “honourable error” which can sometimes be more intellectually interesting than the writings of other scholars whose ideas are not original enough to get rejected.
Rather than translate an existing piece of prose into PIE, what Schleicher did was make up a little story of his own, which allowed him to choose vocabulary whose PIE equivalents he knew. The details of PIE were much less fully worked out in Schleicher’s day than they have been since, and his PIE rendering of his tale looks off-beam now. It gave what would nowadays be seen as excessive weight to the evidence from the Indo-Iranian subfamily as against all the other branches of IE. The same passage has been reworked several times by later scholars, as IE research has progressed. Again, I shall quote Mallory and Adams’s up-to-date version, using the same spelling system as before.
In English the story runs:
On the mountain a sheep that had no wool saw horses — one pulling a heavy waggon, one a great load, and one swiftly carrying a man.
Then the sheep said to the horses: “It pains my heart to see a man driving horses”.
Then the horses said: “Listen, sheep: it pains our heart to see man, the master, making himself a warm garment from sheep’s wool, when the sheep has no wool”.
On hearing this, the sheep fled into the plain.
Not quite so artistically satisfying as the childless-king passage, perhaps, but so be it. (Incidentally, Schleicher’s tale did include some subordinate clauses: ... that had no wool, ... to see a man. Schleicher composed his tale years before the paper of Eduard Hermann, mentioned earlier, was published; if Hermann was right, these particular features of Schleicher’s reconstruction must have been mistaken.)
Here is a PIE version:
Gwrhéei hówis, qésyo wlhnéh ne est, hécwons spécet, hoinom kke gwrhúm wóccom wéccontm, hoinom-qe méghm ppórom, hoinom-qe ccménm hóocu ppérontm.
Hówis tu hecwoippos weuqét: “Céer hekknutór moi, hécwons héjontm hnérm widntéi”.
Hécwoos tu weuqónt: “Cluttí, hówei, céer kke hekknutór nsméi widntppós: hnéer, pótis, héwyom r wlhnéhm seppi qrnéuti nu qqérmom wéstrom; nécci héwyom wlhnéh hésti”.
Tód cecluwóos hówis héjrom ppugét.
It is perhaps a pity that my chosen spelling system requires this passage to begin with a fairly unpronounceable looking sequence of letters in gwrhéei, “on [the] mountain”. But it is not as weird as it looks. The initial consonant is the kw-said-as-one-sound, pronounced as an ejective (or whatever type of articulation the “voiced” consonants corresponded to), and it is followed by a syllabic r.
Many readers will know that the Balkan state which we call Montenegro, Italian for “black mountain”, is called by its own people Crna Gora, or in other Slavonic languages Cherna Gora — gora is the standard Slavonic word for “hill, mountain”. The w-colouration of the PIE gw explains why, when the syllabic r turned into a vowel + consonant sequence, the vowel that appeared was o (why “mountain” is gora rather than gera, say). The vowel of the second syllable is an inflexional ending for the meaning “on the mountain” — in Slavonic languages today, gora ends in a only when it is subject of the verb, and the a changes to something like an e (depending on which Slavonic language we are talking about) to show “place where”.
It would be tedious to go through the entire passage in this way. But notice the word for “heart”, céer — in some circumstances it has an extra sound, céerd. From French one can easily see the link with coeur (in the intermediate language, Latin, the word was cor, or in inflected forms “cord-” — as in “cordial”, which originally meant “hearty”). In Greek, “heart” is kardia — a heart specialist is a cardiologist. And remember that I mentioned earlier that in the Germanic subfamily, k-like sounds in PIE come out as h. Our word heart, too, was originally this same PIE word céer(d).
The Latin for “horse” was equus, as in English “equine”, “equitation”. It is easy to recognize this root in hécwons (first line of the passage), given that the PIE laryngeals dropped out in descendant languages. The fact that this particular root shows up in many branches of IE, demonstrating that PIE speakers knew what horses were, has been a major factor in attempts to locate the PIE homeland (archaeology shows that horses were much less widely distributed six thousand years ago than they are today).
In hecwoippos, “to the horses” (second paragraph), Latinists might recognize a form of the distinctive Latin case ending -ibus for the dative plural — used for people or things that one speaks to, gives something to, etc. When I was at school, though, I certainly would not have got high marks for attaching the -ibus ending to the root of equus. “To the men” was hominibus, but translating “to the horses” as equibus would have been a major clanger. That noun belonged to a different “declension”, meaning that it took a different set of endings to express the same meanings. (There were five regular declensions, plus irregular words.) In PIE, apparently, all nouns were inflected in more or less the same way, and diversity of declensions was a feature that developed as Latin evolved out of PIE — greatly to the sorrow of generation after generation of inky-fingered English schoolboys.
Undoubtedly, many details of these reconstructions would turn out to need modification, if (impossibly) a PIE speaker could return to life and show us how his language really worked. On the other hand, I am not sure than any of the world’s other language families provide data allowing even a tentative reconstruction of an ancestor language reaching as far back into the past as PIE. Readers interested in language prehistory might enjoy looking at the samples and discussion of the earliest reconstructable form of Chinese, in my book Love Songs of Early China. But, though on a world scale Old Chinese is a very “old” language, and it is strikingly different from any modern Chinese dialect, it is nowhere near as old as PIE.
PIE is not the earliest language spoken by Man — far from it. But it is probably the earliest we shall ever see.
last changed 20 Aug 2010