I've been struggling with encoding for a while as I'm biulding a multi-lingual database with sqlite3 in Python. So far, I've solved everything, thanks to Google and articles on Stack Overflow. I had problems with Russian, Slovenian, Polish, Spanish, French... but it's all solved, appart from this ONE file I can't fix.
I thought I had found a possible solution on this website: http://www.smashingmagazine.com/2012/06/06/all-about-unicode-utf8-character-sets/, I even found a decoder, which got me reeeally close to solving the problem. But it only produced partially understandable Russian... (I'm sure it can help in other cases though: http://2cyr.com/decode/?lang=fr and it also exists in English).
But this last file is gonna be the end of me. Here's the major issue: I KNOW it's Russian because the linguist who gave it to me built it, and knows it's in Russian. BUT, the file itself looks like this:
£ËÁÀÝÅÅ UNK £ËÁÀÝÉÊ UNKA
£ËÁÀÝÅÇÏ UNK £ËÁÀÝÉÊ UNKA
£ËÁÀÝÅÊ UNK £ËÁÀÝÉÊ UNKA
£ËÁÀÝÅÍ UNK £ËÁÀÝÉÊ UNKA
£ËÁÀÝÅÍÕ UNK £ËÁÀÝÉÊ UNKA
According to my shell, it's encoded in utf-8. I've therefore been trying to decode utf-8 and encode it into all russian encodings I could find (ISO-8859-5, koi8_r, koi8_u, cp1252, cp1251...). It never worked. I also tried saving the file in all these encodings and decoding the other way around, without much success...
It has to go in a database (sqlite), and I know the required encoding for this is utf-8. The previous Russian file I delt with was "properly" written (in cyrillic), and I just had to figure out which encoding to use. But here, I feel like I've tried everything, I'm just not getting any results...
I'm actually wondering if decoding such a file is even possible, since it's not cyrillic to start with.
Any suggestion would be welcome :)