I am trying to create a speech recognition system for Sinhalese language. I tried to create a language model but following the answer in Build NEW Acoustic model, Dictionary , Language model for uncommon language speech recognition .I used both online lmtool and cmuclmtk-0.7-win32 on windows.My input file as follows,
එක eka
දෙක de ka
තුන thu na
හතර ha tha ra
පහ pa ha
හය ha iya
හත ha tha
අට ah ta
නවය na wa ya
After submitting to lmtool and cmuclmtk i got the output as follows,
AHTA AE T AH
DEKA D AH K AA
EKA EH K AH
HAIYA HH EY AY AH
HATHA HH AE TH AH
HATHARA HH AE TH AH R AH
NAWAYA N AO EY AH
PAHA P AE HH AH
THUNA TH UW N AH
à¶…à¶§
à¶à·”à¶±
දෙක
නවය
à¶´à·„
à·„à¶
à·„à¶à¶»
හය
එක
both .dic and .lm files contains above characters. I feel these are some garbage characters. what did i do wrong to get this?