The vowel system is very irregular, and some names are even ambiguous. In my defense, transliteration is not an easy task, especially with a language as orthographically challenged as English. In my own tests, it had an accuracy of about 95% on a per-character basic, but your mileage may vary. The machine learning method sometimes makes mistakes. Hey, doofus, you messed up my name! I'm Daenerys Targaryen, and you got the last vowel wrong!Ĭongratulations, you took high-school Japanese. This blog post gives more details, for those interested in a complete answer. more subtle rules are applied, such as 'replace G with J if it's followed by an E.' Here is the full list of rules.
For instance, the first rule the system learns is to replace the letter 'L' with the letter 'R', because there is no 'L' in Japanese.
This method is very similar to the Transformation-Based Learner (TBL) invented by Eric Brill.Įssentially, given a list of English/Japanese name pairs, the system learns a series of substitution rules to apply to the English input in order to get the Japanese output. For other names, a learned substitution model trained on these names is applied instead. The Japanese Name Converter uses a combination of dictionary lookup, substitution rules, and machine learning to convert English characters into katakana.įor common English names, a dictionary lookup of about 4,000 English names is used.