Personal website + blog of Ambarish Sridharanarayanan.
Indian languages are written in a diverse set of scripts, most of them derived from the Brahmi script and not from the Phoenician script. These scripts neither look like Latin, nor do they have the familiar A, B, C, … ordering. Further, many Indian languages have many phonemes not present in languages Latin was traditionally used for. As a consequence, many of these scripts have many more than 26 base glyphs (not to mention ligature forms). Mapping these to Latin characters becomes important for 2 distinct uses:
While Unicode encodes most popular scripts used for Indian languages and even some rare ones, Latin‐letters continue to be used in representing Indian languages everywhere. Theyʼre used in email, in SMS messages, in web‐pages, in file‐names; pretty much ubiquitously. But how to map the diverse phoneme set (between 30–50 for most Indian languages) into the 26 letters in the Latin alphabet?
[Likewise, physical keyboards with a QWERTY layout dominate the world; how to allow combinations of characters on the QWERTY keyboard to represent the diverse character sets in Indian languages? Iʼll address this in another post.]
As usual, there are many options.
ISO‐15919, an international scholastic standard, and a few other schemes – IAST, Hunterian, National Library of Kolkata, ALA‐LC – use diacritic (accent) marks over/under Latin characters. Harvard‐Kyoto, Velthuis, ITRANS, SLP1, WX, VedaType and ISO‐15919ʼs limited character set option are schemes that restrict themselves to 7‐bit ASCII but use punctuation characters.
For example, hereʼre the same Sanskrit characters in a few sample schemes:
Devanagari | ISO‐15919 | Hunterian | ISO‐15919‐lcs | Harvard‐Kyoto | ITRANS |
---|---|---|---|---|---|
आ | ā | ā | aa | A | aa/A |
ऋ | r̥ | ri | ,r | R | RRi/R^i |
ए | ē | e | ee | e | e |
ं (anusvāra) | ṁ | m | ;m | M | M |
ख् | kh | kh | kh | kh | kh |
ञ् | ñ | n | ~n | J | ~n/JN |
ड् | ḍ | d | .d | D | D |
श् | ś | sh | ;s | z | sh |
What a mess!
Iʼm going to address what I think should be the hallmarks of a good scheme for representing Indian language text using Latin characters – how one can figure out if such a scheme was thoughtfully, carefully designed and not thrown together on an unrelated Usenet group. (Indian languages being phonetic, Iʼm sometimes careless about the phoneme vs. written character distinction. I carelessly use the word “character” for both; the meaning should be clear from context).
Hereʼs my prioritised list of features a good Indian language representation scheme should have:
f
for the Sanskrit velar nasal (ङ्), as in the wx notation.kevala
, but its Tamil borrowing is represented as kēvala
.RRi
for a syllabic alveolar trill; ITRANSʼs x
for a conjunct consonant and GY
or dny
for another conjunct consonant are all misspellings (for phonetic languages, misspelling and mispronunciations go together!)Only one of the standard schemes fulfils 6 of the 7 requirements completely and the 7th partially (it uses punctuation, but only to represent some combinations that cannot normally occur in the language) – ISO‐15919. In addition to being well thought‐out and sane, it is an international standard and is widely used in scholastic publications. Its one drawback is that the official standard is not available free of cost; instead, ISO charges more then $100 for an electronic copy. However, itʼs fully documented at Dr. Anthony Stoneʼs website and is usable today. I use it everyday, and so should you!