9 Oct 2007

"Mid Indo-European", Semitic and Neolithic numerals

Maybe I'm obsessive but this whole thing about bad Nostratic reconstructions and ancient numerals deserves more discussion. Lots more. I have a love-hate relationship with the Nostratic theory. On the one hand, I'm convinced by its basic premise of certain language groups being related together in the past 15,000 years, and yet I'm also irritated by the results arrived at by people who don't seem to take enough time to work out the details. I especially appreciate some of Allan Bomhard's contributions to Nostratic and yet I'm also left wanting for something more in-depth from him. I want to know exactly what happened in the past without it being doctored up with wishful, half-thought-out thinking and I don't believe for a second that we know all there is to know.

Regarding Bomhard's general reconstruction of Proto-Nostratic, I believe that many of these "Nostratic" roots are not genuine. However, I also think that some of these listed items may rather be potential evidence of loanwords adopted from Proto-Semitic (PSem) into a stage of Pre-Indo-European (Pre-IE) . To me, the example of IE *septm̥ from PSem *sabʕatum is the clearest and most undeniable case of Pre-IE borrowing, which is why I must sound like a broken record when I repeat it so often. So now let's get serious and propose something more realistic than 15,000-year-old numerals.

I suggest that the likeliest time for such an adoption of borrowings is the height of the neolithic around 6000-5500 BCE when trade is known to have flourished across Eastern Europe and the Near East. The neolithic was not just about a wide network of traded goods, but a newly expanded exchange of ideas and a greater sharing of common religious beliefs across larger spans of geography. This I believe would be the main reason behind the spread of "7" from Proto-Semitic into Pre-IE and other languages. Marija Gimbutas wrote about the neolithic period, although in my view she sometimes got too corrupted by feminist revisionism to be taken seriously. For instance, it's too simplistic to say that Indo-European speakers were all patriarchal warriors and native Europeans were all matrifocal pacifists. This sensationalism sells lots of books but the study of ethnology is far more complicated than this modern idealism.

During the neolithic, I envision a network of various groups speaking a number of Pre-IE dialects over a large territory surrounded by some "Para-Pre-IE" dialects (i.e. indirect "cousins" of IE that were later taken over by expanding IE) and non-IE languages. Pre-IE speakers would also have had a number of different traditions, belief systems and genetic origins dependent on the region one is speaking of. The core of Pre-IE would have been the areas west of the steppes. I seperate the Pre-IE stages of Indo-European arbitrarily into three sections to keep things tidy in my head:

Old Indo-European (OIE) - 7000-6000 BCE
Mid Indo-European (MIE) - 6000-5000 BCE
Late Indo-European (LIE) - 5000-4000 BCE

I use Proto-Indo-European (PIE) to refer to the very last state of the language before it began to fragment into dialects like Proto-Anatolian. I believe that it was MIE that first adopted Semitic vocabulary, including a few numerals. In this chronology, MIE is a stage of Proto-Indo-European immediately before the event of Syncope (i.e. the point at which unstressed vowels were dropped or reduced in all positions causing clustering and important changes to IE phonotactics). I place Syncope at the beginning of the Late Indo-European period, circa 5000 BCE. Before Syncope, MIE had far less clusters than LIE and had a predictable accent fixed on the penultimate or antepenultimate syllable. Now on that note, I would like to shamelessly propose my following theory for discussion that I've been developing for years:
wordMIE (PIE)Semitic
'three'*tareisa (*treis)*θalāθu
'six'*sʷeksa (*sweḱs)*šidθu[1]
'seven'*septam (*septm̥)*sabʕatum

As we can see, the pre-Syncope vowels are necessary to fully understand what has happened. Having only *e and *a to fill syllables in MIE, final *-a was pronounced as schwa, thereby mimicking the Semitic nominative in *-u. The PSem stress accent was probably also predictable, being placed on the lastmost, non-final "heavy syllable" (a syllable that was either closed (CVC) or contained a long vowel), or failing this, the accent was placed on the initial syllable. In "three", the long front vowel of PSem was naturally heard by MIE speakers as a diphthong *ei since this was the closest approximation in a language without long vowels. The Semitic dental fricative () was normally interpreted as initial *t- or medial *-s- in MIE (shown in both "3" and "6"), both of which are again natural approximations in a language that lacks this sound (n.b. consider how many French speakers pronounce voiced /ð/ in "that" as /z/ or /d/ instead).

The word "six" needs further explanation because it has confused many linguists as to why it should be that the Semitic cluster *-dθ- ended up as *-ḱs- in PIE[2]. First of all, we should notice that PSem *d is not the same phoneme as PIE *d. The important difference is that PSem *d was alveolar (as in English) while it was dental in PIE (as in French). This means that the Semitic sound as well as the following dental fricative were pronounced further back in the mouth than IE speakers were used to. In its place then, a velar stop would be an understandable replacement for the dental stop here and coincidentally, the *s in *-ks- would have necessarily been alveolar next to a velar stop since it's near impossible to pronounce a dental *s immediately after retracting the tongue. So now we can see why this was an optimal solution for IE speakers and furthermore there are many borrowings in other languages where stops are switched like this (note the history of the name Carthage). There is also the fact that from the perspective of markedness, to make a long story short, PIE *ḱ must logically be reinterpreted as a plain velar stop *k (not palatalized) while *k must have been a uvular or pharyngeal *q. However until the traditional notation is abolished, the topic of velar stops in IE will remain confusing and misunderstood.

I've probably raised more questions than answers with this topic of Mid Indo-European but hopefully this will inspire more discussion on the topic of Pre-IE because I think this untouched aspect of Indo-European linguistics is full of interesting possibilities. My roughly hewn theory may not be perfect but I think this is a better answer to the problem than Bomhard's implausible Nostratic roots, *sʷakʰsʷ- "six" and *sab- "seven" [3].

[1] Semitic reconstructions are from Gray, Introduction to Semitic Comparative Linguistics (1934). p.70. As an interesting aside, one may appreciate Klimov's Etymological Dictionary of the Kartvelian Languages for a run-down on Proto-Kartvelian šwid- "seven" which is derived from a Semitic masculine, non-mimated form of the numeral, *sabʕatu (Akk. šibit).
[2] Page 106 of Bernal Martin's Black Athena: The Afroasiatic Roots of Classical Civilization is a perfect example of how many authors confuse rather than inform us on the topic by simply offering a raw dump of completely conflicting ideas that fail to answer to any appreciable degree how the words might or might not be plausibly related.
[3] Bomhard/Kerns, The Nostratic Macrofamily: A Study in Distant Linguistic Relationship (1994).

1. (Oct 9/07) Please note that while some may feel that the realization of PSem as MIE *sʷ is strange, there is precedent in English, French and Italian pronunciations of the sh-sound /ʃ/ as /ʃʷ/. See Ball/Müller, Phonetics for Communication Disorders in Chapter 14: English fricatives. Based on this, we may surmise that PSem was similarly pronounced as /ʃʷ/.


  1. The only problem with this theory, that I have, is that I'm quite sure that h2 was in fact a voiced pharyngeal approximant, looking at how the ayin influences vowels in Arabic, I find this a plausible theory. But if this were to be true, there's no clear reason why the Indo-Europeans did not not loan *sabʕatum as *seph2tm or something along those lines.

    Nevertheless the proposal is very attractive. I'm curious what you think of the laryngeal correspondence between the two languages.

  2. I knew someone was going to bring this up, but if I tried to cover every issue here, I'd have an article two miles long :)

    In my theory of Pre-IE, the original MIE phonetic realization of *h1, *h2 and *h3 was respectively /ʔ/, /x/ and /xʷ/. (My reasons for this relate to "Vowel Centralization" in early IndoAegean circa 8000 BCE and areal influence with pre-NWC in the West Asian steppelands. But ne'er you mind that for now.)

    In Late IE, *-h1- in medial positions was weakened and had an allophone /h/; *h2 became a uvular /χ/; and *h3 remained velar /xʷ/. I theorize a Laryngeal Shift after Syncope as a forced alignment between the "h-series" and the "k-series". Thus *h1 was now paired with *ḱ /k/, *h2 with *k /q/ (vowel-colouring phonemes), and *h3 with *kʷ. In Post-IE, *h3 was becoming voiced (/ʁ/) making it easier to weaken and leave behind its phantom labialization (/ɦʷ/ -> /ʷ/).

    So in MIE, with only velar fricatives to work with, null was a better approximation of a Semitic pharyngeal. People make the mistake of simply assuming that Pre-IE phonology was exactly the same as IE phonology. I don't believe that at all, and I think there are a number of considerations based on internal reconstruction to think otherwise.

  3. What to do with *weks which is given by Armenian, *ksek's which is given by Slavic languages and *kswek's whichi is given by Avesta?

    It looks like the original form was *kswek's, with the later ks : ks -> s : ks in most dialects.

  4. By the way, in Proto-Semitic there was a form *šišš-, beside *šidt-. If we consider *kswek's to be the original form, then the match -š- ~ -ks- is almost perfect.

    Then, -w- may be just an artifact from the preceding *penkwe. Digits > 4 don't have ablauts and are indeclinable, so that means they were mostly used for counting in their order: 1,2,3,4,5, while trading and that's it. So, then it's plausible: *penkwe, *kseks > *penkwe, *ksweks.

  5. Carsten: "What to do with *weks which is given by Armenian, *ksek's which is given by Slavic languages and *kswek's whichi is given by Avesta?"

    One thing we don't do is put these outliers on a pedestal and ignore all the rest of the data motivating *sweks. It's about whether you're seeing the forest or only the trees. Numerals are very prone to irregular changes because they're part of a set of many members that can influence each other. That combined with extraneous number symbolisms and word puns can be a great force against regular sound correspondence. We need to take all these minor variations with a grain of salt. Most IEists accept *sweks as the cardinal form while deviations are regarded as dialectal, surfacing after the fragmentation of the PIE-speaking community.

    Carsten: "By the way, in Proto-Semitic there was a form *šišš-, beside *šidt-."

    It would be preferable if you paid due respect to the fact that generally Semiticists reconstruct *šidθ-, not **šišš- or **šidt- because the latter forms can be perfectly accounted for by simple assimilation. The sound of theta becomes esh in Hebrew and Akkadian while preserved in Ugaritic and Arabic. Notice that Ugaritic shows *θiθθ-, a radical assimilation that could only have taken place long after Proto-Semitic. The worst thing we could do here is assume that all these dialectal variations are attributable to the protolanguage.