2 Oct 2007

The Tower of Babel

Sergei Starostin, before sadly passing away from a heart attack at the early age of 52, headed the Tower of Babel Project, an online project whose name is in unfortunate reference to a Christian biblical myth about language origins as given in the book of Genesis. Ironically, much like the bible and its unquestioning literalists, many linguaphiles online are in the habit of quoting Starostin and holding his theories in great esteem while ignoring all the deep problems of logic that are immanent in them.

Now, before you believe in wiki-haste that I'm just a nagging oddball, please note that my view is on the side of academics who have large issues with Starostin's work. There is no conspiracy against Starostin and his followers. We don't have a hate-on for him as a person and I'm sure he had a jovial spirit by all accounts, but these critiques arise because his theories are simply not tenable and too far-flung to be of use to a disciplined linguist. If people feel too personal about linguistic critiques, it's their issue to deal with.
  • http://www.accessmylibrary.com/coms2/summary_0286-25704408_ITM
    "With fewer than 300 linguists in the world doing serious work on long-range comparisons, the discipline is small and perennially insecure about its scientific standards. Given the dearth of rigorous proof for some of Starostin's assertions, many American linguists felt within their rights to dismiss his research or at least to exclude him from their conferences and symposiums."
So let's see why this is the case from examples we can see within Starostin's own project. At his site, a myriad of proto-languages seem to be tackled all at once but it centers around his personal baby: "North Caucasian" (note that the name refers strictly to the languages of the northern regions of the Caucasus mountains, not to a racial term). Throughout his life he tried to force the two language families called Abkhaz-Adyghe and Nakh-Daghestanian into the same, proverbial, round hole. The view of actual Caucasian language specialists is that Abkhaz-Adhyge (AA) and Nakh-Daghestanian (ND) are entirely different in structure, phonology and word order. While AA tends towards simple three-vowel systems (some even claim two-vowels), ND is radically rich in vowels and consonants. The two families couldn't be any more different from each other. Plus, any similarities between the two could easily have been caused by strong areal influence between the two within such a small region and over a large expanse of time.

No matter. Starostin was unfazed and he set to work to create a hybrid language, more like a conlanger than a comparative linguist. He no doubt figured that his language would serve as a valid ancestor to both language families. Only one problem. It only works if you ignore rigourous methodology. Here are just a few flaws that I notice with North Caucasian that proponents just don't care enough about:

1. Violations of phonological markedness are everywhere
I think Starostin had a "diacritic addiction". For example, all pronouns like first person *zō violate markedness because plain *z is so rare in his convoluted phonology while palatal *ź is used to explain almost everything (see his list of 'z words' here) where his diacritic bias is self-evident. Many phonemic exotica are far too often employed, such as *ƛ̣, to make up for his shortcomings in taking the time to adequately demonstrate sound correspondences (e.g. the ambiguous reconstruction *Ł_ĕɫV̆ with link here). Through the use of contrived sounds as a smokescreen, void of a discernible phonological structure that all human languages have, he can freely connect different etyma together no matter how large of a leapfrog we have to hurdle to swallow it, making it seem to laymen as though North Caucasian has a lot more evidence behind it than it actually does.

2. The pronominal system is unnatural

Considering that his reconstructed pronouns curiously use only voiced consonants, an attentive linguist might consider the effects of sentence-internal lenition on grammatical elements and more sensibly reconstruct unvoiced *s for *z in the 1ps pronoun or *t instead of *d for the 2ps pronoun. One might also reduce the bloated phonemic inventory to a more manageable level that could then be finally accounted for systematically by demonstrable proof, instead of leaving it all hanging as an empty assertion.

3. Shoddy claims in more understood language groups of his database compromise his credibility.

Nothing could be more far-flung than Altaic **séjra "three" (link here) when Altaicists reconstruct *göl- "three" on more direct evidence (Mongolian gurav, Japanese kokono-). Despite the fact that it is agreed upon by specialists that Proto-Dravidian is reconstructed without voicing contrast in stops (read here), Starostin took an anarchist approach by representing Dravidian with them (link here).
The number of protolanguages in his database and the wide variety is impressive. The quantity of seeming information alone would cause many to believe that his work is exceptional. However, quantity is not quality. I think that what the issue was with Sergei Starostin was that, like many language lovers, they failed to see that linguistics is a science and not a form of artistic expression.


  1. Altaicists reconstruct *göl- "three" on more direct evidence (Mongolian gurav, Japanese kokono-).

    I'm extremely interested how they made this work.

    By default I'm extremely sceptical about including Japanese into the Altaic language family, to me there's simply not enough proof.

    Besides that, last time I checked kokono- is the root of nine, rather than three. Although three times three is nine, I doubt the Japanese were feeling that mathematical that they somehow had a 'square root' suffix to make the word three to mean nine.

    And with my limited knowledge of Mongolian, and a not too helpful dictionary, I'm having no success at all what gurav might mean. But last time I checked, classical Mongolian didn't have a /v/. What am I missing here? :D

    Besides that, I have also run into Starostin's theories, and although it's a nice effort, I think he has indeed lost focus of the science of comparative linguistics.

  2. phoenix: "Besides that, last time I checked kokono- is the root of nine, rather than three."

    Yes, that's right. Old Japanese kökönö- is supposed to originally mean "three threes" (note the element *kö which is related to this reconstruction). Presumably the liquid was dropped at the end. Now go ahead and butcher that hypothesis if you like ;)

    However, the funny thing about the native elements of the Japanese numeral system is that they exhibit a clear binary contrast. Apparently the Japanese were more mathematical than you appreciate. The binary contrasts using ablaut are clearer in Old Japanese:

    - pitö- "1" / puta- "2"
    - mi- "3" / mu- "6"
    - yö- "4" / ya- "8"

    As you can see, the ablaut pairs are also consistent: i/u, ö/a.

    phoenix: "And with my limited knowledge of Mongolian, and a not too helpful dictionary, I'm having no success at all what gurav might mean."

    It just means "three". It's best to hear a local pronounce them. Thank god for YouTube:

    Count 1-10 in Mongolian

    Sounds like a -v in gurav to me :) Read more in this pdf about the Mongolian language where gurav is explicitly written in its grammar examples.

  3. Oh, I should add that in Classical Mongolian, the word for "three" was ghurban.

    And maybe I should also add that because of the mi-/mu- binary madness, it all seems to imply, at least to me, that mu- was the original word for "6" while an ablauted mi- replaced the former word for "three", namely *kö. It's survival in "9" however is natural in such a system where 9 is not divisible by 2, and thus does not have an ablaut pair.

  4. Wow, that's pretty exciting!

    And I had never expected that I'd agree with a reconstruction of three using the word for nine. :D

    Besides that those numerals, I can't believe I never realised that myself. Although I had noticed the hitotsu-futatsu thing.

    I have a slight problem with the mi-mu correspondence though. As you probably know the word for three is actually 'mittsu' with a geminate, whereas six is 'mutsu'. Shouldn't the geminate be some kind of indication of an assimilated coda consonant? That way the correspondence wouldn't be 100%. It makes you wonder where that code consonant came from.

    Or considering Yottsu and yattsu, maybe we should wonder why the geminate disappeared in mutsu.

    Looking in my dictionary, I just ran into a secondary form muttsu. Which still makes me wonder where mutsu came from. Best thing I can think of is analogy of hitotsu and futatsu. But why would six be subjected to analogy of one and two, and not three?

    It's quite common for words that follow each other directly in series often repeated to affect each other. Like 'four and five' which shouldn't have the same initial consonants according to our PIE reconstructions.

    But wow, that's one crazy counting system. :D

  5. On a second though, wouldn't it also make a lot more sense if we'd reconstruct **séjra as 'pitchfork/trident' rather than three. To me it seems more likely for Korean to generalise a trident into three than the other way around.

    All together though, I'm not really sure how sound this etymology is all together. I'm not to well read on Altaic historical linguistics. Is there any source you could recommend me?

  6. phoenix: "On a second though, wouldn't it also make a lot more sense if we'd reconstruct **séjra as 'pitchfork/trident' rather than three. To me it seems more likely for Korean to generalise a trident into three than the other way around."

    If we're going to use "a lot more sense", I would suggest just throwing Starostin's whims into the wastepaper basket altogether. I know I sound mean, but really I don't know what can be salvaged of any informative value from his database. None of it is very likely in my view at all. He relied too much on metathetic parlour tricks and inconsistent sound changes for me to respect his academic judgment.

    Here's a snippet from The Korean Language by Ho-Min Sohn, pages 19 & 20, concerning Korean's relationship to Altaic outside of the Starostin universe. It's published by the Cambridge University Press so it can't be too shabby, one would expect. Although, sadly, he cites Starostin a few times, which to me is like citing Mayani in Etruscan studies. Whatever, to each his own, I guess.

    To get a handle on Proto-Altaic, there are names other than Starostin to explore, such as Hamp, Miller and Vovin. Here's a summary of the Proto-Altaic à la Starostin (click here for pdf) to compare with others' accounts. I have a few issues here, such as: How can *r possibly be avoided in the proper reconstruction of PA "four" (and it is reconstructed by others)?; Why isn't *m-/*b- alternation in the 1ps pronoun recognized for what it is: regular nasal allophony in initial position in certain environments?; Why do we need a long, neurotic table of sound changes when it suffices to summarize the changes with a shorter list using standard linguistic terms; Do we really need two forms of the oblique 2ps pronoun to explain the data?; Etc., etc. I also despise reconstructions with parentheses in them. The whole thing is a theory anyway and the use of parentheses or wildcard phonemes tells me that the theorist in question is not really committed enough to what they are proposing.

  7. I actually could have a lot to comment upon here, but this would take up too much time and space, so let me just clear up a slight misunderstanding:

    Sergei Starostin did not reconstruct initial voiced stops in Proto-Dravidian, although he did agree with the reconstruction. Voicing was added to the Dravidian database by its compiler, George Starostin, and the authorship of the database is quite explicitly mentioned on the site. And the real reason why some of the Dravidian etyma are reconstructed there with voiced consonants is exactly the one whose lack you are deploring: rigorous methodology, namely, the small issue of regular phonetic correspondences, which is the true "focus of the science of comparative linguistics". This was also one of the subjects of my PhD thesis and a couple articles (in Russian, unfortunately - but the data speak for themselves).

    Yours truly
    G. Starostin

  8. The only misunderstanding so far is your own. Whoever is the originator of these pet theories as represented on the Tower of Babel website is a moot point and I merely said in my above article that he "represented" Dravidian this way (i.e. through this project that happens to be in his name afterall). Regardless of whether one reconstructs these piecemeal roots oneself or endorses them, most of the views on that website are indefensible. In fact, they are consistently (perhaps purposely) contrary to prevailing knowledge and methodology, as with this abuse of voiced stops in Proto-Dravidian which is contrary to prevailing reconstruction. The burden of proof is that of its few proponents, not my own or that of other conservative scholars.

    Since I've already demonstrated, with examples yet, how Starostin ignored methodology (whether as a theorist or as a promoter of other theorists who feel that Occam's Razor is too oppressive), the logical thing to do would be to either address these examples (and many others) or to forfeit the claim that this website demonstrates in all seriousness any "rigorous linguistic methodology".

    You feign as though criticism is unusual but you surely have read things like the following.

    Johanna Nichols in Current Trends in Caucasian, East European, and Inner Asian Linguistics (2003), p.208:
    "Diakanoff and Starostin 1986 assume that Hurro-Urartean is related to Nakh-Daghestanian, and assemble putative cognate sets so as to maximize similarities between the two families. Nikolayev and Starostin (1994) assume that Nakh-Daghestanian and Northwest Caucasian form a family and offer reconstructions for that putative family, again assembling putative cognate sets so as to maximize matches and similarities between ND and NWC. This method offers no guarantee that the ND subpart of an ND-NWC putative cognate set is a proper ND cognate set, and in approximately a third of the cases they are not (Smeets 1989, Nichols 1997). [...] Niether Diakonoff and Starostin, nor Nikolayev and Starostin, take on the burden of proof and discuss whether the incidence of resemblances exceeds chance expectation, nor do they present examples of the kind of shared morphological paradigmaticity that would strongly support genetic relatedness. Accordingly, the possibility of external relations to NWC and/or Hurro-Urartean must be regarded as an opinion for which no support has been offered."

    Or in a nutshell, there is far too little "rigorous methodology" in Starostin's work to be taken seriously.