1

I have a string where I know that the degree symbol (°) is represented by the byte 63 (3F).

Each character is represented by a single byte.

How can I find the character encoding used ?

Sebtm
  • 7,002
  • 8
  • 29
  • 32
  • Perhaps the byte is actually the character '?' (byte 63) because the odbc driver with which I extract the data does not know how to represent the character and replaces it with '?' – Sebtm Mar 08 '12 at 12:45
  • How do you know that byte 0x3F corresponds to U+00B0 ‹°› `DEGREE SIGN`? I have a tool that reliably identifies the 8-bit encoding of a textfile, but it requires more than a single byte to do a good job. It has a language model trained on several very large English-language corpora, so does well (>99% accuracy) on such texts. You can (and should) use a different model for a different language if it is not English. – tchrist Mar 08 '12 at 15:01
  • I know for sure that it is the degree symbol. Only I do not know the chars encoding. – Sebtm Mar 08 '12 at 16:00

1 Answers1

1

Almost all 8-bit encodings in modern times coincide with ASCII in the ASCII range, so byte 3F hexadecimal is the question mark “?”. As Sebtm’s comment suggests, this might result from character-level data error. E.g., some software that is limited to ASCII could turn all other bytes to “?” – not a good practice, but possible.

If it were a non-ASCII byte, you could use the page http://www.eki.ee/letter/chardata.cgi?search=degree+sign to make a guess.

Jukka K. Korpela
  • 195,524
  • 37
  • 270
  • 390