I have a decompiled stardict dictionary in the form of a tab file
κακός <tab> bad
where <tab>
signifies a tabulation.
Unfortunately, the way the words are defined requires the query to include all diacritical marks. So if I want to search for ζῷον, I need to have all the iotas and circumflexes correct.
Thus I'd like to convert the whole file so that the keyword has the diacritic removed. So the line would become
κακος <tab> <h3>κακός</h3> <br/> bad
I know I could read the file line by line in bash, as described here [1]
while read line
do
command
done <file
But what is there any way to automatize the operation of converting the line? I heard about iconv
[2] but didn't manage to achieve the desired conversion using it. I'd best like to use a bash script.
Besides, is there an automatic way of transliterating Greek, e.g. using the method Perseus has?
/edit: Maybe we could use the Unicode codes? We can notice that U+1F0x
, U+1F8x
for x < 8
, etc. are all variants of the letter α. This would reduce the amount of manual work. I'd accept a C++ solution as well.
[1] http://en.kioskea.net/faq/1757-how-to-read-a-file-line-by-line
[2] How to remove all of the diacritics from a file?