8 Mar 2008

Considering that many IEists believe that there is a succinct possibility of there having been contact between early IE speakers and Semitic speakers, whether direct or indirect, I think that this much neglected subject needs to be explored further. I've always been thrilled by the classic example of Proto-Indo-European (PIE) *septḿ̥ which is a perfectly preserved fossil of Semitic grammar, with *-t- representing the Semitic feminine marker and *-m̥ marking the so-called "mimated" form, presumably a definite marker in function. I then wonder about the potential of Proto-IE and other neighbouring protolanguages to show us snippets of Proto-Semitic grammar in action. Newbies and ultra-conservative IEists may be afraid of these ideas but they have to learn to deal with it because it's not going away. We need to figure out exactly what happened instead of continuing to hold onto and exploit 'mystery' in these areas. Personally, I've developed an elaborate theory based on internal reconstruction of PIE that predicts that Mid IE's counterpart was *séptam with accent originally on the initial syllable. (The accent would later fall on the last syllable because of being affected by the accent of the numeral *h₁októu "eight" but the zerograded last syllable is testimony to its former accentuation.)

There's some confirmation that I might not be offtrack when it comes to another word that I'm really suspecting is a loanword from Semitic into Pre-IE (specifically in Mid IE). While the consonantism for the Semitic verb "to know" is normally reconstructed as *[ydʕ], Old Canaanite Cuneiform Texts of the Third Millennium (1979) speaks on the uncertainty of the root's form on page 193, footnote 11: "It is not clear whether to reconstruct w or y as the initial consonant of the Proto-Semitic root." That's somewhat interesting because if the Semitic root was in fact *[wdʕ], its potential relationship to PIE *weid- becomes stronger.

Based on my theories concerning Mid IE, *waid̰a should be the antecedent form of PIE 3ps *wóide. If so and if it were yet another Semitic loanword, we should expect that this is taken from the Semitic stative form *wadiʕu with accent on the initial syllable because there are no heavy syllables here. That is, there are no non-final syllables of the form CVC or CV: which attract accent from the default initial position in Proto-Semitic as they tend to do in its daughter languages[1]. This correspondence is interesting because it would hint that prepalatalization was heard on the Semitic *-d-. (Note also the correspondence of Semitic *gadyu 'kid, young goat' and PIE *gʰáido- "goat" which also shows a curious metathesis of alveolar stop and glide.) Presumably then, the exact pronunciation of the Semitic word would be *['wæʲdɪʕʊ] and received into MIE as *waid̰a ['wajd̰ə].[2] This would then imply that perfective *wóid-e came first, and then through the process of ablaut *wēid-ti was formed, in turn becoming present *wéid-e-ti and aorist *weid-t once Anatolian went its seperate way.

I can't believe so few are talking about this. Doesn't anyone else find this fascinating? Doesn't anyone else think that the subject of early Indo-European loans is our priority over reconstructing Nostratic roots blindly and that it's terribly important to Indo-European studies? Drink more coffee, people. There's tonnes more to cover on this topic.

[1] What I suggest for the stress accent placement of Proto-Semitic is much like in modern Arabic. See Ryding, A Reference Grammar of Modern Standard Arabic (2005), p.37 (link here).
[2] Whether this metathesis would be due to the peculiar pronunciation of Proto-Semitic speakers themselves or due to inaccurate perception and imitation by Mid IE speakers, such a switch-up in the transmission of a word from one language to the next is perfectly natural and found elsewhere. Darya Kavitskaya in Compensatory Lengthening: Phonetics, Phonology, Diachrony (2002) relates her own story about perceptual metathesis on page 48 in footnote 8: "Indeed, in teaching Russian to American students, I noticed many instances of palatalization of the consonant being heard as some kind of diphthongal property of the preceding vowel, for example, [banʲa] 'bath' was misheard and pronounced as [baʲnʲa] or even [bajna]." (link here).


  1. Perhaps, one could also argue that the fact that it's a loanword might explain why the perfect wasn't reduplication. Though that's of course difficult to say, since it must have been a really old loan, and doesn't 'feel' foreign, once it naturalises into the language.

    I'm not really sure whether Anatolian has this root. But believe that it might be post-Anatolian.

    Then of course there's the question if there was productive reduplication at all. But at least the Graeco-Iranian dialect had it. And they bothy have a missing reduplication in this root.

  2. Phoenix: "Perhaps, one could also argue that the fact that it's a loanword might explain why the perfect wasn't reduplication."

    Excellent! I love a good argument :) However, the lack of reduplication might just have something to do with the fact that this verb is inherently stative in meaning and that CeCoC-reduplication is tied specifically to a perfective meaning (i.e. a state that is *achieved by an action*). So I think this can be explained without appeal to foreign influence.

    Phoenix: "But believe that it might be post-Anatolian."

    Then how do you address the idea that Tocharian ūwe 'learned' is from PTch *wäwen < PIE *wid-wó- or *wid-won- (Sanskrit vidvān)? Tocharian preceded the satem wave as PIE started expanding eastwards to Asia (sometime before 2500 BCE) which means that the root must be quite ancient if it exists in Proto-Tocharian.

    Phoenix: "Then of course there's the question if there was productive reduplication at all."

    Yes, this gets complicated but one way or another, as I previously said, the notion of "knowing" is by nature stative so if the whole purpose of CeCoC-reduplication was to convey the state produced specifically from an action, then *weid- wouldn't be part of it, nor would any naturally stative verb.

  3. Wow I should really start rereading the stuff I read after a night of bad sleeping, I feel ashamed for some of the terrible language mistakes I made in my last post :D

    I wasn't aware of the CeCoC- reduplication being for perfective and CoC- for stative thing.

    Are there any other perfects without reduplication? I always thought *weid- was unique in that aspect, and it seems odd to base such an idea of stative vs perfective on only one example.

    I looked it up in my Sanskrit grammar and indeed only find ved-, same for greek eid/oid-.

  4. Ah, hell. All of us are making mistakes. They say it's part of the human disease ;) And there's so much to learn on this subject that I've been studying this for years and still getting screwed up by some details. It's also harder when there are several different theories around. That's why I find that the only way to keep track of everyting at all is to develop your own theory and to test it on a regular basis with new facts. Otherwise I find that I'm just swimming around aimlessly in someone else's theory.

    Phoenix: "I wasn't aware of the CeCoC- reduplication being for perfective and CoC- for stative thing."

    Yep, I recall reading about this in Encyclopaedia Britannica, Volume 22, Languages of the World under "Indo-European". (You can see a snippet of that text online here which affirms the relationship of the o-grade to stativeness.) This also explains nicely why nouns such as *loukos 'light' are in the o-grade (i.e. 'light' = 'that which is in a state of being bright' < *leuk- 'to be bright').

    Besides, consider *h1es- "to be" which is as stative as verbs come. It doesn't have a perfective form like **h1e-h1os-. It is known and amply published that such perfectives "typically refer to the state resulting from the action expressed by the verb" and since "to be" cannot be a result of an action (unless we're talking about "becoming"), this is yet another stative verb that lacks a perfective form.

    Usually, those whacky IEists lump statives and perfectives together in the same breath (according to "post-Indo-Anatolian" or "IE 2" grammar) which helps to confuse their students so that no one will ever know when they're wrong ;) The older stage of PIE from which Anatolian languages derive cannot have developed a strict CeCoC-perfective yet.

  5. Speaking of "7" again (and since I don't have any other method of contacting you), have you giv'n any thought on how the Proto-Uralic form *śeńćimä (or *śeńtimä) fits into this? IIRC I've seen that a number of times cited as Nostratic evidence.

    In the Uralic literature I've browsed, -mä seems to be commonly explained as a fossilized suffix of some sort (like basically all 3rd+ PU root syllables) and if this is the case then there seems to be a clear tendency for a numeral ending *-ti (FYI: 1 *ykti, 2 *kakti, 3 usually *kolme but I've seen also *kolmet, 4 *ńeljä, 5 *wixti, 6 *kuxti) so I personally have no idea nor pet hypothesis if it's thru coïndicence or indirect influence shaped similarly or if there's some sort of genealogic connection to the PIE / PS forms.

  6. So it's evidently pretty clear that stative verbs have great trouble becoming perfects. Indeed makes sense to me that *h1es- wouldn't have a perfect. But why is *woid- a 'hi-verb' then. In many ways, that word is reminiscent of the old Anatolian mi/hi-distinction. But Post-Anatolian Indo-European clearly got rid of that distinction. Still we find *woid- functioning like a hi-verb in Greek, Sanskrit and Germanic (and undoubtedly other languages of which I'm less sure).

  7. Tropylium: "Speaking of '7' again (and since I don't have any other method of contacting you), have you giv'n any thought on how the Proto-Uralic form *śeńćimä (or *śeńtimä) fits into this? IIRC I've seen that a number of times cited as Nostratic evidence."

    This is nonsense because IEists have already solved that riddle long ago. The Finno-Ugric word for "seven" is a borrowing from early Indo-Iranian by or after the third millenium BCE. Denis Sinor even goes so far as to assert that this numeral was borrowed into Finnic and Ugric seperately from Proto-Aryan based on certain phonetic arguments. So Uralic **śeńtimä has no basis and Nostraticists are hardly more trustworthy authorities on this matter than actual specialists who deal intimately in these details. As well, from a simple logical standpoint, a more recent four- or five-thousand-year-old loan is much more credible than the blind belief that the word must be 15,000 years old! Clearly Occam's Razor is the decider.

  8. Sorry, that should be Proto-Indo-Aryan, not Proto-Aryan. Mea culpa.

  9. ...I'll need to apologize for wrong language family too, since upon re-checking, my etymological dictionary does have a footnote about the Ugric forms being IA loan(s). Still, no explanation there nor in your links (as far as I got within the viewing limit) on the origin of the Proto-Finno-Permic form. (The actual reconstruction staying the same.)

    Just to be clear, I wasn't expecting you to consider this evidence for Nostratic, and yes, Occam's Razor still suggests that it would be a loan too, but it's not an actual explanation on the word's origin.

  10. Tropylium: "Still, no explanation there nor in your links [...] on the origin of the Proto-Finno-Permic form. (The actual reconstruction staying the same.)"

    Since you fail to cite any references, I'm led to suspect that you're depending blindly on Sergei Starostin's extravagant Uralic root **śeŋ́ćemä for 'seven' despite Starostin's persistent rebelliousness against strict methodology.

    Denial is not a valid logical argument. You are denying that an explanation has already been given (i.e. "that the word is loaned from an early Indo-European dialect"). Yet you fail to tell us clearly what you expect from an 'explanation' in order to qualify it as such according to your own whimsical definition.

  11. Actually, no, I'm relying here on Kaisa Häkkinen: Nykysuomen etymologinen sanakirja (WSOY 2005, ISBN 951027108X) ("Etymological Dictionary of Modern Finnish") as well as Lauri Hakulinen: Suomen kielen rakenne ja kehitys (4th ed, Otava 1979, ISBN 9789514592218) ("The Structure and Development of the Finnish Language").

    I was asking if you had something in mind about the unkno'n (according the Häkkinen) origins of this form. I'm not aware of any sound laws capable of turning -pt- into Finnic -its-... tho that's not to say that they couldn't exist. (Neither author really explains the reconstructed form - it's presumably from older literature.)

    However, this is starting to get tangential at best to your original issue so maybe I'll just stop wasting your time with it...

  12. Tropylium: "I'm not aware of any sound laws capable of turning -pt- into Finnic -its-... tho that's not to say that they couldn't exist."

    I like to poke people with a metaphorical hot iron until they crack and start asking direct questions instead of vague allusions to problems. Direct questions get direct answers and implied questions only get implied answers, if at all. You passed the first test. Congrats!

    I'm not convinced by **śeńtimä because it's so widely understood that '7' is an Indo-European loan and yet your claim would deny us that connection without gain.

    We must ask "Where is the nasal *-ń- attested in this word?" The answer appears to be "Nowhere." Mark Rosenfeld (zompist.com) went through a heckuvalot of trouble to list out numbers from 1 to 10 in 5000 languages and we can see that in all the Finno-Permian languages not a single language shows any trace of this *-ń-. (Mari contains shimit but it's clear that this is just shim- plus -it judging by the neighbouring numerals inherited from Uralic, viz-it and kud-it.)

    We also know that Finnish diphthong ei (as in Finnish ei 'not') isn't necessarily caused by a former nasal. So this reconstruction appears to be highly assumptive and in denial of many things. Do you know of any proof of *-ń- in the data? If not, that phoneme has no business being there.

    Tropylium: "However, this is starting to get tangential at best to your original issue so maybe I'll just stop wasting your time with it...

    Since this is a matter concerning comparative linguistics and this blog is about comparative linguistics, it's not wasting my time ;)

  13. Well if you ARE still interested, maybe I will continue...

    The -ń-, wherever it may originate from, is not the gist of the problem here (tho my hunch would be that it's posited to explain some correspondence for which -j- would not work). For a younger, fairly watertight intermediate reconstruction, take the Proto-Finno-Samic *śeiće-(män). I'm afraid I don't have a direct cite for this but that's what the supposed PFP form would by regular development become in the next step, and it regularly explains the present-day forms.

    The problem is that this still matches none of the phonemes of the PIE word it's supposed to stem from. Furthermore -pt- regularly comes into Finnic as -ht- (or -ps- / _i) and this even includes non-loaned, Pan-Uralic words like "hair". -ut- and -ft- and -t- and what have you for its place in various IE branches also keep regularly, with good track record, turning into not -its-. To secure the connection here, we'd need a bunch of fairly unusual sound changes to have occured either in some "proxy" language or an older protolanguage stage (and the latter choice would almost certainly imply Pre-Proto-Uralic).

    I think it's interesting how Häkkinen, a professor on the history of Finno-Ugric studies and of Finnish itself, explicitly states that only the Ugric form has been loaned from IE. What this situation looks like to me is a Finno-Ugric/IE connection proposed ages ago, before detailed methodology, and still being propagated under inertia, even if the claim doesn't hold together for the Finnic branch under close scrutiny.

    Direct enuff yet?

  14. Tropylium: "Well if you ARE still interested, maybe I will continue..."

    I am interested. In facts. Theories are excellent... only when grounded in facts. If you ignore facts, I get bitchy. Think of me as the Judge Judy of linguistics. :)

    Tropylium: "For a younger, fairly watertight intermediate reconstruction, take the Proto-Finno-Samic *śeiće-(män)."

    Given the available data which I already gave to you and which you forgot to look at (please for the love of logic CLICK HERE), your reconstruction and the way you segment it is false.

    All you have to do is compare the numbers between Finnish and Estonian to see the error of your ways. The Finnish numbers above '6' contain what I would presume to be a pluralizing -n: seitsemän '7', kahdeksan '8', yhdeksän '9', kymmenen '10'. Estonian consistently lacks this suffix: seitse '7', kaheksa '8', üheksa '9', kümme '10'. The absence of -n is also supported by Volgaic and Permic languages. The addition of the suffix then must be a Finnish innovation.

    Ergo, your reconstruction **śeiće-(män) is prejudiced by Finnish data and missegmented as well. More likely the form was something like *śeićem which quite obviously then is a loan from an Indo-European language.

    Tropylium: "Direct enuff yet?"

    Not yet. Keep trying. I know you want to hold onto your Nostratic myth but it's time to give it up in this particular case. The odds are against you.

  15. Just to add, note also the singular/plural paradigm of numbers in Finnish and ponder on how the final -n is deleted when -t is attached in the plural in numbers above '6'. Why would that be unless -n were indeed a recent, interloping suffix? Why is '50' in Finnish viisikymmentä/viisikymmenet instead of *viisikymmenen/*viisikymmenenet if -n were truly an archaicism in these numerals?

    I rest my case. ;)

  16. If I'm being unclear: I'm not arguing anything about the segmentation; the parenthesized part is simply the part I'm not arguing as "watertight". Yes, the /m/ seems fairly secure too. Yes, the /n/ could very well be a Finnish innovation. These are both irrelevant to my question.

    Also, I'm not "holding onto Nostratic" here, I've yet to be convinced with the hypothesis in any direction. Just because I propose something doesn't mean I necessarily support the proposal. This, as you ought to kno, is called "thinking out loud" or "testing hypotheses". (Or are you testing your metaphorical hot iron again? I can't tell.)

    Point is, I'm not trying to convince you of anything, I'm hoping to be myself convinced of that being an IE loanword. But said "obvious loan" keeps looking non-obvious since no relevant Indo-European language, at least going by the linked list that you seem to trust in, contains a palatalized initial, a medial affricate, or a non-labial medial. This is far too much to chalk up to random irregularities or modification at loaning. These issues need to be addressed if this is to be estabilished as a loan.

  17. Tropylium: "But said 'obvious loan' keeps looking non-obvious since no relevant Indo-European language, at least going by the linked list that you seem to trust in, contains a palatalized initial, a medial affricate, or a non-labial medial."

    Alright, message received. Thanks for clarifying and thank you for being patient with my stubborn bitchiness. I think the above quote is the heart of your issue here.

    Considering examples of loans such as Japanese マクドナルド (Makudonarudo) for "McDonald's", I don't think that you've fully considered all the possibilities. In this rather extreme case, Japanese cannot tolerate the clusters of the English language and is compelled to insert vowels to ease pronunciation and to naturalize the word into the stricter phonotactical rules of Japanese. In various ways, languages do this all the time when absorbing foreign vocabulary but with different phonotactic rules.

    So I have to admit that Indo-Iranian is probably not an optimal option to explain the reflexes in Finno-Permian languages and I'm frankly not well-versed in these languages, but I cannot believe that such a similar word for '7' could come from anywhere but an IE language nonetheless.

    The absence of *p in Finno-Permian does not imply automatically that *p is absent in the donor IE language but may be a matter of phonotactic adjustment like my above example in Japanese. The initial sibilants correspond to IE *s, the medial affricates correspond to IE *t and final *-m is also reflected in the Finno-Permian data. The entire consonantal skeleton of an IE word is present in these Uralic languages. This is why I say that the odds are against you to deny anything but an IE origin.

    What is the most optimal etymological solution but this? It doesn't appear to me that a denial of this etymology is really the optimal solution.