2014-03-27

A custom keymap for Indian languages

As we saw in the last couple of posts, keying in Indian languages using a QWERTY keyboard requires a keyboard/IME software as well as a standardised way to map the Latin alphabet to the characters in the Indian language du jour. As before, I use Google's Input Tools on Windows and Lipika on OS X. Unlike a representation format (which case use diacritic or other accent marks), a key-map can only employ the characters inputtable through the QWERTY keyboard. So while I use ISO-15919 as the representation format, I needed a key-map as well. As in the previous post, here were my requirements:
  1. Meaningfulness
  2. Pan‐linguistic consistency
  3. Fidelity to pronunciation
  4. Modularity and symmetry
  5. Alphabet restrictions: the scheme must use only Latin characters to represent phonemes; the scheme may use punctuation marks to represent non-phonemic punctuation-like characters in the target language.
With these requirements, I set about to create a key-map I could use. I'd start with my requirements, and in the end, if the key-map ended up resembling an existing "standard", I'd just stick with that instead.

I started out by identifying characters in Tamil and Sanskrit (the 2 Indian languages I write in) based on phonetics and history; this identification process is important for pan-linguistic consistency.

Vowels and Dependents
Sanskrit (Devanagari) ISO‐15919 Tamil Key-map
a
i
u
r̥̄
l̥̄
e
ai
o
au
'

Consonants
Sanskrit (Devanagari) ISO‐159191 Tamil Key-map
क् k க்
ख् kh
ग् g
घ् gh
ङ् ங்
च् c ச்
छ् ch
ज् j
झ् jh
ञ् ஞ்
ट् ட்
ठ् ṭh
ड्
ढ् ḍh
ण् ண்
t ̱ ற்
ன்
त् t த்
थ् th
द् d
ध् dh
न् n ந்
प् p ப்
फ् ph
ब् b
भ् bh
म् m ம்
य् y ய்
र् r ர்
r ̣ ழ்
ळ् ள்
ல்
ल् l
व् v வ்
श्
ष्
स् s
ह् h

The next step was filling in the key-combinations that were "natural" and "obvious".
  1. Given the existence of short and long vowels, using lower- and upper-case letters for vowels seems natural.
  2. Naturally, any unmarked consonant in ISO-15919 can be mapped to the bare letter.
  3. Representing retroflexion by upper-casing the corresponding dental consonant is standard-practice. By modularity, we can do the same for liquids and sibilants too.

Sanskrit (Devanagari) ISO‐15919 Tamil Key-map
a a
A
i i
I
u u
U
r̥̄
l̥̄
e e
E
ai ai
o o
O
au au
'
क् k க் k
ख् kh kh
ग् g g
घ् gh gh
ङ् ங்
च् c ச் c
छ् ch ch
ज् j j
झ् jh jh
ञ् ஞ்
ट् ட் T
ठ् ṭh Th
ड् D
ढ् ḍh Dh
ण् ண் N
t ̱ ற்
ன்
त् t த் t
थ् th th
द् d d
ध् dh dh
न् n ந் n
प् p ப் p
फ् ph ph
ब् b b
भ् bh bh
म् m ம் m
य् y ய் y
र् r ர் r
r ̣ ழ்
ळ् ள் L
ல்
ल् l l
व् v வ் v
श्
ष् S
स् s s
ह् h h

6 issues remain: Dravidian alveolar consonants, the Dravidian approximant, Sanskrit nasals, Sanskrit sibilants, Sanskrit syllabic vowels, and miscellaneous rarely used dependents.
  1. Dravidian alveolar consonants: from the point of view of tongue-position, alveolar stops are intermediate between dental stops and retroflex stops. From this, a natural choice of key-combination for an alveolar stop is a juxtaposition of the keys for the corresponding dental and retroflex stops. Likewise for the alveolar liquid ல்.
  2. Dravidian approximant: based on usage, I picked 'z' as the key for the approximant ழ். The fact that non-native speakers mispronounce the approximant as a voiced sibilant adds credibility to this choice :-)
  3. Sanskrit nasals and sibilants: there are 2 remaining nasals: ङ्, ञ् and one remaining sibilant: श्. The palatal nasal is both a palatal stop and a nasal; a natural representation combines the nasality of 'n' with the palatalness of 'j' or 'c'; we thus get 'nj' and 'nc' as possible key-combinations. By correspondence, the palatal sibilant श् is 'sc' or 'sj', and the velar nasal ङ् 'nk or 'ng'.
Looks like the consonants are done! Here they are:
Consonants
Sanskrit (Devanagari) ISO‐15919 Tamil Key-map
क् k க் k
ख् kh kh
ग् g g
घ् gh gh
ङ् ங் nk/ng
च् c ச் c
छ् ch ch
ज् j j
झ् jh jh
ञ् ஞ் nc/nj
ट् ட் T
ठ् ṭh Th
ड् D
ढ् ḍh Dh
ण् ண் N
t ̱ ற் tT/Tt
ன் nN/Nn
त् t த் t
थ् th th
द् d d
ध् dh dh
न् n ந் n
प् p ப் p
फ् ph ph
ब् b b
भ् bh bh
म् m ம் m
य् y ய் y
र् r ர் r
r ̣ ழ் z
ळ् ள் L
ல் lL/Ll
ल् l l
व् v வ் v
श् sc/sj
ष् S
स् s s
ह् h h
  1. Sanskrit syllabic vowels: The Sanskrit syllabic vowels (ऋ, ॠ, ऌ, ॡ – the last one not actually used) present a problem. The mid-central vowel inherent in these is absent in European languages and thus lacks a symbol; it can however be described as mid-way between 'y' and 'w'. 'y' is already used up in our scheme, but 'w' is free! Using 'w' also ensures people don't mispronounce it as a front-vowel. We thus get 'rw', 'Rw', 'lw' and 'Lw' respectively.
  2. Misc. dependent letters: There are a few different dependent letters that can only existƒ attached to a vowel — the anusvāra, the anunāsika, the visarga and its two other forms the jihvāmulīya and the upadhmānīya, and the āythayeṛuttu. The anusvāra is traditionally represented by an 'M', and the anunāsika by 'MM'; we can stick with those. The visarga, likewise is an 'H'. The upadhmānīya is closest to the Latin 'f', and we can use that. The jihvāmūlīya and the āythayeṛuttu are both velar/glottal and as such 'K' is the most suitable.
We finally have a complete key-map for vowels and dependents! Here it is:
Vowels and Dependents
Sanskrit (Devanagari) ISO‐15919 Tamil Key-map
a a
A
i i
I
u u
U
rw
r̥̄ Rw
lw
l̥̄ Lw
e e
E
ai ai
o o
O
au au
M
MM
H
f
K
K
' '

You can download the keymap for Tamil and Sanskrit from http://code.ambari.sh/keymap.

Footnotes:

1 Unfortunately, ISO-15919 does not distinguish between alveolar and dental liquids; Tamil has only the former, while Sanskrit only the latter. As such, I've had to make a few minor modifications to ISO-15919, where ற and ல are concerned. Thanks to Greg for pointing this out in the comments.

7 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. Above ற and ழ should be transliterated as ṟ and ḻ not ṛ and ṯ

    ReplyDelete
    Replies
    1. You're right. Sadly, ISO-15919 wrongly identifies alveolar and dental liquids, and has similar issues with the alveolar stop consonants. I've added a foot-note to clarify this. Thanks!

      Delete
  3. ल् = ல் = l (lower case L) Unicode name DEVANAGARI LETTER LA/TAMIL LETTER LA/LATIN SMALL LETTER L. Can you explain why you don't equate ल् and ல்?
    Also please explain why you transliterate ற் UNICODE TAMIL LETTER RRA as ṯ and not ṟ?
    I've seen many in India transliterate ழ் as z or zh however in Unicode it is named UNICODE TAMIL LETTER LLLA
    Finally you are probably aware that in India the use of Nukta in Devanagari has allowed the Devanagari script to represent even these non-standard letters that are only standard in Tamil.
    e.g. ழ் = ऴ Tamil/Devanagari letter LLLA, ற் = ऱ Tamil/Devanagari letter RRA

    ReplyDelete
    Replies
    1. In Sanskrit, ल् is a dental liquid and corresponds to the dental stops त्, थ्, etc. In Tamil, ல் is an alveolar liquid and corresponds to the alveolar stops ற் and ன், and not to the dental stops த் and ந். That's why I don't equate them.

      ற் historically is an alveolar stop consonant. In many modern dialects (not in all!), it's transformed into a trill of some sort. I chose to use historically accurate representations.

      Why do you call ள், ழ், etc. "non-standard"? I think I've seen the use of these underdots in Devanagari, but I'm not sure they're popular. And why anyone would use Devanagari to represent Tamil letters is not clear to me.

      Delete
    2. Thanks for your full explanation. Sorry my use of the term "non-standard" was in relation to the standard alphabets of most Indic scripts. Tamil being markedly different and these particular letters not being found in other scripts. Yes, I agree the use of Nukta in Devanagari to represent Tamil letters is up till now rare. However I believe it is meant to allow official Hindi documents to include these Tamil letters. According to the Indian constitution English is supposed to be phased out as an official language and Hindi alone is supposed to fill that slot. Of course this has so far proved to be a pipe dream and with English being such an important language of international commerce and technology it probably always will be the other official government language of communication along with Hindi.

      Delete
    3. Regarding Tamil being different from the rest, I'm not sure things are as cut and dry as that. ழ் exists in Malayalam. ற் in Malayalam and Telugu as well (and in older Kannada). ள் in Sanskrit, Malayalam, Kannada, Marathi and probably others. Gurmukhi has sounds not used in Sanskrit (and therefore not in Devanagari); likewise, Sindhi too.

      Regarding adding new letters to Devanagari, I agree with you it's a pipe-dream. Scripts and languages are determined by usage and not by fiat or policy :-)

      Delete