3

I would like to know the way a stream or a string is encoded before converting it. If a string is not utf-8 it would be converted to utf-8.

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
marko
  • 10,684
  • 17
  • 71
  • 92
  • Well, look at this http://stackoverflow.com/questions/90838/how-can-i-detect-the-encoding-codepage-of-a-text-file – Dewfy Sep 06 '11 at 13:39

1 Answers1

1

You could use the WinAPI function IsTextUnicode which uses a heuristic to guess the proper encoding. Note that this can go hilariously wrong.

The best way is not to accept streams without external information about their encoding, and failing that, internal information (such as a BOM, or a HTML meta-tag with encoding information).

Konrad Rudolph
  • 530,221
  • 131
  • 937
  • 1,214
  • Ok, but if I was only to detect that the stream is not utf-8, then I convert it to utf-8. – marko Sep 06 '11 at 14:04
  • @marko True, in that case the above function will probably work quite well. As far as I see, most ambiguities are with UTF-16LE. – Konrad Rudolph Sep 06 '11 at 14:09
  • +1. See also [Raymond Chen on the subject of IsTextUnicode](http://blogs.msdn.com/b/oldnewthing/archive/2007/04/17/2158334.aspx) and also [this duplicate question](http://stackoverflow.com/questions/90838/how-can-i-detect-the-encoding-codepage-of-a-text-file) – MarkJ Sep 06 '11 at 14:28