Discover the character encoding from byte

Question

I have a string where I know that the degree symbol (°) is represented by the byte 63 (3F).

Each character is represented by a single byte.

How can I find the character encoding used ?

Perhaps the byte is actually the character '?' (byte 63) because the odbc driver with which I extract the data does not know how to represent the character and replaces it with '?' — Sebtm, Mar 08 '12 at 12:45
How do you know that byte 0x3F corresponds to U+00B0 ‹°› `DEGREE SIGN`? I have a tool that reliably identifies the 8-bit encoding of a textfile, but it requires more than a single byte to do a good job. It has a language model trained on several very large English-language corpora, so does well (>99% accuracy) on such texts. You can (and should) use a different model for a different language if it is not English. — tchrist, Mar 08 '12 at 15:01
I know for sure that it is the degree symbol. Only I do not know the chars encoding. — Sebtm, Mar 08 '12 at 16:00

score 1 · Accepted Answer · answered Mar 08 '12 at 14:49

Almost all 8-bit encodings in modern times coincide with ASCII in the ASCII range, so byte 3F hexadecimal is the question mark “?”. As Sebtm’s comment suggests, this might result from character-level data error. E.g., some software that is limited to ASCII could turn all other bytes to “?” – not a good practice, but possible.

If it were a non-ASCII byte, you could use the page http://www.eki.ee/letter/chardata.cgi?search=degree+sign to make a guess.

Discover the character encoding from byte

1 Answers1

Linked