0

It is my understanding that txt files do not have encoding information stored so text editors simply make educated guesses about encoding of a given text file and then display the file on screen using that guessed encoding. If the editor guessed right you get your text on the screen, if the editor guessed wrong, then you (sometimes) get gibberish. Am I getting this right so far?

Now on to my problem. I have my bank statements in a csv file. When I open it in MS Excel 14 (MS Office 2010), it recognises the encoding and displays the problematic work as "obračun". Great. When I open the file in Emacs 24.3.1, it fails to recognise the correct encoding and displays the problematic word as "obra鑾n". Not so great.

My question is: how do I ask Excel which encoding the file is in? So I can tell that to Emacs since Excel obviously guessed correctly.

Thanks.

dijxtra
  • 2,681
  • 4
  • 25
  • 37
  • *"Am I getting this right so far?"* If by *"txt files"* you mean plain text files that contain only the bytes that make up the characters, then yes. Look at your files in a hex editor when in doubt. Microsoft Office files, however, are not plain text files. E.g. .docx bears more resemblance to a zip archive than to a plain text file. [This question](https://stackoverflow.com/questions/13235189/how-can-i-determine-the-character-encoding-of-an-excel-file) might be helpful – jDo Mar 24 '16 at 09:54

1 Answers1

0

This could be a possible answer: http://metty-mathews.blogspot.si/2013/08/excel2013-character-encoding.html

After I opened ‘Advanced’ – ‘Web Options’ – ‘Encoding’, it said "Central European (Windows)" in "Save this document as:" field. It turns out that's Microsoft's name for Windows-1250 encoding and it turns out my file was indeed encoded with this encoding.

Is this just pure luck or does this field really show in which encoding Excel is displaying text - that I do not know.

dijxtra
  • 2,681
  • 4
  • 25
  • 37
  • 1
    If you know two of three variables, couldn't you get the third? The variables being 1. encoding used, 2. byte value, 3. resulting character displayed on screen. Which encoding would you need to turn byte value X into character Y on screen? Which byte value must you have for encoding X to produce character Y on screen? Which character would appear on the screen if byte value X was interpreted using encoding Y? You know the character on screen and you could probably find the byte value using a hex editor. That would give you one or more possible encodings (they may overlap). – jDo Mar 24 '16 at 10:17
  • 1
    Good idea. Next time I'm stuck with unknown encoding I'll try to code something like that. Tnx. – dijxtra Mar 24 '16 at 11:43
  • Yeah, it was Windows-1250, I mentioned it in the answer. – dijxtra Mar 25 '16 at 13:32