Showing posts with label phonology. Show all posts
Showing posts with label phonology. Show all posts

28 Jan 2010

How to make a mockery of Proto-Japanese

"Nikolayev and Starostin 1994 offer many putative cognate sets and reconstructions for Nakh-Daghestanian and its branches, as part of a reconstruction of a putative North Caucasian comprising Northwest Caucasian and Nakh-Daghestanian. These two families have not in fact been shown to be related. Nikolayev and Starostin proceed by assuming relatedness and then assembling cognate sets so as to maximize recurrent correspondences." (Berkeley Linguistics Society. Proceedings of the Annual Meeting of the Berkeley Linguistics Society (1975), p.12)
Every once in a while I still get a few zealots from the long-ranger camp coming to the defense of Sergei Starostin's work with misguided and condescending comments to the effect of "How dare you criticize Starostin!" and "But look at all these correspondences". As to the first line of reasoning: a theory above rational criticism is called a cult. I don't do cults nor do I respect their followers. I also find it frustrating to talk to someone that persists on confusing academic criticism with personal attack and by so doing, creating arguments out of nothing. As to the second line of reasoning, it's the very implausibility of Starostin's correspondences that I object to, so showing me more of his paltry 'sound correspondences' is only being irritating and grievously wastes all of our time.

Recently, this letter was about the origin of Japanese numerals and it was evident that the commenter failed to absorb any of the simple facts I explained in The hidden binary behind the Japanese numeral system. So with guiltless glee I dropped his comment in the wastebasket with all the other troll trash. However, it's a chance to gloss over more pedantic silliness written by doctors of linguistics who should know better. Be brave; be self-sufficient; be curious; open your eyes wide and keep your brain closed to fantasy.

On page 341 of Choi, Japanese/Korean Linguistics, Vol. 3 (1993) (see link), in the article Notes on Some Japanese-Korean Phonetic Correspondences by Alexander Vovin, we see a list of reconstructed numerals comparing Proto-Japanese, Proto-Manchu-Tungus and Proto-Altaic. Notice, reader, how Proto-Japanese *mi 'three' is claimed to come from *ñi and that the attachment of *[ñ] to the Proto-Manchu-Tungus etymon is unexplained and ad hoc, together with the fact that a change of [ɲ] to [m] neighbouring a front high vowel is absurd and completely unmotivated from the perspective of rational notions of phonology. Compare also Starostin's *ŋ[i̯u] with Vovin's *ñïl₁ï and how fellow Altaicists aren't agreeing even on numerals or which cognates to use. We're led to believe that *mu 'six' is to be from *ñu as well and one might remark how curious it is that so many marked phonemes are being reconstructed for simple number terms. God knows why both his *-l₁ï in Altaic 'three' and *-ŋu in 'six' so conveniently disappear in Japanese, why *ñ- in '3' and *i- in '5' need to be added in Proto-Manchu-Tungus to smooth things over, et cetera ad absurdum. Indeed, maybe only God could explain such absurdity; one absurdity for another. Meanwhile, the *d- in 'four' and the same *d- in 'eight' are derived from different sources which, by doing so, ignores the unignorable vowel harmony inherent in the attested Old Japanese set (ie. mi- '3'/mu- '6'; yö- '4'/ya- '8')! No reasonable person can claim credibly that the consistent 'binary' pattern in Japanese numbers simply happened by chance from originally dissimilar roots as Vovin, Starostin et alia are effectively claiming by the shape of these reconstructions.

In other words, theory trumps facts just like we find in all religions. It's amazing what gets passed peer review (assuming such a thing was ever effective in weeding out nonsense).

25 Nov 2007

How NOT to reconstruct a protolanguage

I wrote an article last month, The Tower of Babel, which was an unexhaustive critical assessment of the late Sergei Starostin's grandiose online language project that limps on today through the efforts of surviving project members. A recent troll on that page under an unconvincing disguise of "G.Starostin" sent me two messages, one visible because it was civil if not misguided, while the second was abusive and thrown in the trash after I took note of his IP address. In case anyone was confused, my blog isn't a mouthpiece for proto-world rhetoric and I'm an ardent defender of mainstream linguistics despite my moderate interest in long-range linguistics. It suffices to reject the Tower of Babel project based simply on the consistent use of outdated and even disprovable information. Things such as its Indo-European database, infected with Julius Pokorny's 1950s reconstructions which notoriously neglected to reconstruct laryngeals to properly account for reflexes in Anatolian languages like Hittite, Luwian and Lycian. When a word halχ is assumed to mean “10” a priori in Etruscan purely by eyeballing texts and ripping words out of context in order to reject what is already established to be śar (c.f. Bonfante, Reading the Past - Etruscan (1990), p.61), Starostin's supplied pdf entitled Etruscan numerals: Problems and Results of Research by S. A. Iatsemirsky[1] is not credible enough to identify neither the problems nor the results of serious Etruscan research. Its Dravidian database is full of largely unaccepted reconstructions using voiced stops that are not proven to be necessary in that proto-language[2]. Then the addition of a Nostratic database and "Long Range Etymologies" is sure to add to the air of mediocrity of the website, putting the cart before the horse in light of the numerous mistakes regarding the more accepted languages and language families I just mentioned. This is all on top of the decidedly negative assessments of North Caucasian pushed by Sergei Starostin during his lifetime (Johanna Nichols, Current Trends in Caucasian, East European, and Inner Asian Linguistics (2003), p.208). I personally believe in an efficient use of time. So if it's proven that this website is consistently at odds with the mainstream, one would be wise to obtain a higher quality of information elsewhere.

On that note, it's important to discuss how NOT to reconstruct a protolanguage so that we're all on the same page and can more easily distinguish between real linguists and narrow-minded loons, whether online or in print. Considering that even Merritt Ruhlen of "Proto-World" infamy[3] has obtained his PhD from Stanford University, it's important to not be deceived by academic status. Theories can be ill-conceived no matter who one is or claims to be. So let's go through my cheeky list of important strategies that we can follow (using examples from the Tower of Babel project) if we want to isolate ourselves and be rejected by all universities around the world.

1. Use "phonemic wildcards" obsessively!
Cast the net wider and you might catch something!

The abuse of mathematical symbols like C, V, [a-z], (a/é/ö), etc. are an excellent way to make your idle conjecture look like a valid theory. It might be called "reconstruction by parentheses" since parentheses are either explicitly shown or hidden by a single variable. An example of this is *k`egVnV (claimed to be the Proto-Altaic word for "nine" in the Tower of Babel database). Obviously, if V represents all possible vowels in this proto-language and there are, say, ten of them possible in either position, then the fact that there are two wildcards in the same word means that the word represents a humungous, two-dimensional matrix of ONE HUNDRED possible permutations (10*10=100):

*k`egana, *k`egena, *k`egina, *k`egüna, *k`egïna, etc.
*k`egane, *k`egene, *k`egine, *k`egüne, *k`egïne, etc.
*k`egani, *k`egeni, *k`egini, *k`egüni, *k`egïni, etc.
*k`eganü, *k`egenü, *k`eginü, *k`egünü, *k`egïnü, etc.
etc.

Since no single form is actually being posited when wildcards are present, any claim of regular correspondence by such a theorist can be easily identified as fraud. If such linguists can't take themselves seriously enough to hypothesize a structured and testable theory, why then should we take them seriously in turn?

Other hilarious examples of wildcard fairy tales on the Tower of Babel site include Nostratic *cUKV ( ˜ č`-) "bundle" (in other words, all four are wildcards... jackpot!), Dravidian *kaṬ- "to cut into pieces" (universal onomatopoeia, anyone?), Semitic *ʔVrib- "tie (a knot)" (based on a single language, Arabic) and North Caucasian *ƛ̣_VẋwV ( ˜ Ł_-)̆ "rake" (wow, the number of possible permutations in this wildcard buffet is positively mindboggling! 200 perhaps?).

2. Ignore Occam's Razor and never seek logical justification for your ideas!
If an exotic phoneme gives you an orgasm, reconstruct it!

Most longrangers ignore Occam's Razor or fail to apply it in all aspects of their budding theory. It's easy to understand why it's not valid to reconstruct a sound in a proto-language which shows no regular correspondence in its daughter languages. However, even when one has justified a phoneme with evidence, one still has to justify the plausibility of the larger sound system that it's a part of. So if you have greater evidence for a palatal *ź than you do for its plain counterpart *z, you still have a problem to solve (c.f. phonemic markedness). If pronouns and common affixes use the more complicated sounds of the inventory of your proto-language, you still have a problem since this goes against the trend in languages we observe throughout the world, a reason that Allen Bomhard used to reject Illich-Svitych's reconstruction of Nostratic (e.g. Illich-Svitych and Dolgopolsky reconstructed the 2ps pronoun starting with the symbol *ṭ-, an ejective rather than its plain counterpart). This is how Occam's Razor works. In all aspects of our theory, we must abide by the simplest answer possible. Whenever you hear an argument like "Yeah, but, there's this language in some remote part of Africa with 30 speakers that uses a really rare sound or does something else that's really rare just like in my theory!" then you know that you're not dealing with someone in their right mind. Occam's Razor avoids unnecessarily exotic solutions at all times and teaches us to not confuse "minute possibility" for "convincing probability". For example, Klallam is certainly an existing spoken language, but there's also no doubt that its sound system and consonant clusters are very rare. So Klallam is something that your proto-language should not look like until you have solid proof (i.e. numerous regular sound correspondences) to back it all up.

By searching in the Tower of Babel's North Caucasian database for words beginning with sibilants, we get the following screwy search results. As of today, only one word with plain *z- in initial position is to be found, namely the first person pronoun claimed to be *zō, despite the fact that there are two instances of *ź- and *ž-. This means that plain *z- is outnumbered 3 to 1 by the comparatively more exotic counterparts with palatalization, labialization, clusters, etc. Even worse, there are only two instances of plain *s- among twelve roots starting with unvoiced sibilants. So plain phonemes are in the minority, as we would find if we were reconstructing a science-fiction language. Consistently, Starostin's North Caucasian defies any rational structure or common sense and a perfect example of diacritic overkill.

3. Make pages and pages of "correspondence tables"
They're sure to impress your family members!

"Correspondence tables" are lists of sounds in the daughter languages of a hypothetical proto-language proposed to prove regular correspondence and thus genuine relationship. So we can say that Germanic often corresponds to Latin t as Jacob Grimm remarked upon in 1822 showing that Germanic and Latin are part of the Indo-European family of languages. However, language isn't that simple and far more often than not, there are numerous exceptions to such simplistic equations. For example, the word 'eight' is octo in Latin and yet *ahtōu with a *t in Germanic. This is because the stop fails to be weakened to a fricative after another stop. What good then are correspondence tables when we can save time and space by actually describing sound changes and their processes? For some reason, Nostraticists and other longrangers like to use these at every turn, as does Sergei Starostin. These childishly repetitive tables simply waste pages and pages of paper and bandwidth without being terribly informative, but it's certainly an excellent way to make your book look thicker and impress your family.

4. Remember: All critics are conspiring against you!
Beat dead horses to death and if you can't win, punch them!

You may find that your theory isn't gaining the kind of press that you had hoped and quite a few may be noticing several flaws in your theory. You may not have a single factoid in your favour to form a coherent rebuttal. This is when you bring out the big guns: ignorance combined with non sequitur. This tactic must be handled delicately however. You could try attacking your critics on the personal level, whether that be through the direct use of swearwords or through subtle mockery of your opponent. However this is a desperate last resort, more common on Yahoo! Forums or Youtube. It looks more professional however to simply ignore critics altogether while overpraising the capabilities of yourself and your associates. Using a plethora of unnecessarily sesquipedalian, multipolysyllabic megaterminology, such as "lexicostatistical", is a great tactic to conceal the weaknesses of your theories, as is treating your conjectures as proven facts in any of your publications so as to not bog down your important work with silly things like justification or common sense. Remember, all critics don't know what they're talking about. Their valid criticisms are just a devilish trick of theirs to throw you off-track and pull you off of your hobby horse.

NOTES
[1] Note that this pdf incorrectly cites TLE 295 in reference to a word zar when in fact it's properly TLE 275. Furthermore, automatically assuming that zar and śar are the same word purposely ignores phonemic distinctions in order to stroke one's pet theory. The instance of huθ-zars declined in the genitive case (TLE 191) has absolutely nothing to do with zar and everything to do with the fact that a dental stop plus the initial sibilant of attested śar (TCort ii) yield z // in this one particular instance. It's all quite understandable once one puts in the time and effort learning the basics of Etruscan phonetics.
[2] See Krishnamurti, Comparative Dravidian Linguistics: Current Perspectives (2001), p.250 [click here]
[3] Visit Mark Rosenfeld's humorous but rational article on the Proto-World language and its associated failures in reasoning: Deriving Proto-World with tools you probably have at home. One of the most poignant criticisms towards the proposals of Merrit Ruhlen and Joseph Greenberg (R&G) that I appreciate here is: "R&G really gain the benefit of obscurity here: how many of us can determine whether they are (unconsciously) playing the same kind of tricks with Tfaltik and Guamo as I am playing with Chinese and Quechua here?" This criticism is equally applicable to Starostin's theory of North Caucasian and his Tower of Babel project where a similar "benefit of obscurity" is being used against his readers.

UPDATES
(Feb 14 2008)
My entry The hidden binary behind the Japanese numeral system exposes another flaw in Starostin's reconstructions concerning the origin of Japanese numerals.

17 Apr 2007

How to pronounce Proto-Indo-European stops

Particularly for English-speakers unaccustomed to such phonetic novelties, there's a common question one asks when one first encounters the un-English phonology reconstructed for Proto-Indo-European: How on earth can anyone distinguish phonemes like *t, *d and *dh from each other? The first is called a voiceless stop, the second a voiced stop but the third a voiced aspirate stop. The last one trips a lot of people out.

However, no need to fear. Treat your tongue as a science lab. Find a quiet place away from prying ears who may think you've gone mad. Just have fun playing with voicing and aspiration as you pronounce a "d" over and over a little differently each time. After repeated practice, you should get a feel for it. Of course, it often helps if you can hear what the difference is. Take a look at this wonderful page on Hindi phonetics that gives visitors an audio recording of similar differences in that language:

http://phonetics.ucla.edu/course/chapter6/hindi/hindi.html

This site also has some other exotic phonemes that even more advanced linguistics students may find instructive. Personally, I just get a kick out the phonetic jumbalaya!

I'm probably lucky in that I grew up speaking English and French from childhood, so this whole conundrum wasn't so difficult for me. English and French are the two official languages here in Canada, of course, and my parents tried to make me a good Canadian. Alas, they tried. As I mastered both languages through gradeschool, I began to notice the subtle differences between an English "d" and a French "d" while my peers were more busy outside, discovering the joys of cigarette smoking behind the garbage bin.

It turns out that the differences between stops in both languages have to do with "voicing onset". But I didn't need the technical term to know what I was hearing. What I instinctively began to sense was what I personally called a "light d" (in English) and a "heavy d" (in French). It turned out that my ears were on to something. I was hearing the subtle fact that French "d" is pronounced with a longer duration of voicing, something I perceived as a deeper, richer sound. My English d's certainly sounded weaker somehow because oftentimes they are only semi-voiced. As a child, you learn to mimick sounds very easily in your environment and only much later do you think about it all.

Some African languages push the voicing onset to the extreme, even beyond French, and create phonemes like nd and mb. One might say that the initial nasal sound is what naturally happens when voicing extends beyond just the 'd' itself. To the other extreme, some languages don't have any voicing at all but the stop may still sound like a 'd' to English-speakers if the stop happens to lack aspiration. For example, when a Mexican pronounces Spanish queso, it may sound like 'gay-so' to an Anglophone even though the sound is in fact an unaspirated, voiceless "k".

(If you're interested in the ugly technical details, you may find this article right up your alley. It's called Perception of VOT and First Formant Onset by Spanish and English Speakers [pdf] from the University of Michigan. It shows us how easily speakers of a new language can mispronounce and misinterpret sounds because of the influences of their native language's phonology.)

Now if you're lucky enough to speak English and another contrasting tongue with significant differences in voicing onset, then you can begin to understand how a single language like IE could have had a distinction between this "light d" and "heavy d". It's not the only one (cf. Hindi or Thai).

Hopefully this layman explanation will encourage a few more zany paleoglots out there to successfully revive a dead protolanguage for their own enjoyment.