2

I have a legacy database table with a mixed encoding. Some lines are UTF-8 and some lines are ISO 8859-1.

Are there some heuristics I can apply on the content of a line to guess which encoding best represents the content?

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Jerome WAGNER
  • 21,986
  • 8
  • 62
  • 77

2 Answers2

1

Convert from UTF-8. If that fails then it's not UTF-8, so you should probably convert from Latin-1 instead.

Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358
0

Compare

iconv("UTF-8", "ISO-8859-1//IGNORE", $text)

and

iconv("UTF-8", "ISO-8859-1", $text)

If they are not equal - consider it UTF-8.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Vladislav Rastrusny
  • 29,378
  • 23
  • 95
  • 156
  • 1
    What is it supposed to do? How does it work? Why will the result be different in some cases? What it is doing? Please respond by [editing (changing) your answer](https://stackoverflow.com/posts/5259448/edit), not here in comments (***without*** "Edit:", "Update:", or similar - the answer should appear as if it was written today). – Peter Mortensen Mar 16 '22 at 17:03