0

I am using chardet to detect encoding of text files including Italian. The problem is it consistently detects their encoding as iso-8859-2 while the correct detection would be iso-8859-1. Does anybody know a fix? My local language is set to Polish? Could that influence the detection?

twowo
  • 621
  • 1
  • 8
  • 15
  • Since iso-8859-2 is for Eastern European languages, I would say that yes, that probably influences the detection. Which method do you use to detect the encoding? – Junuxx Oct 10 '12 at 15:36
  • Junuxx - I am using a 'detect' method e.g. chardet.detect(text) – twowo Oct 10 '12 at 15:49
  • I recommend reading the accepted answer in this [question](http://stackoverflow.com/questions/436220/python-is-there-a-way-to-determine-the-encoding-of-text-file). – Pedro Romano Oct 10 '12 at 17:23

1 Answers1

1

chardet doesn't support iso-8859-1, that's why it's not detecting it. For supported character encodings, see chardets homepage - http://pypi.python.org/pypi/chardet.

I use the Linux program 'file' to get the character encoding of different content, however I'm not sure how safe it is, see my question - Encoding detection in Python, use the chardet library or not?. But it works with great results for me so far.

Btw, your local language should not influence the detection.

Community
  • 1
  • 1
Niklas9
  • 8,816
  • 8
  • 37
  • 60