Italian dected as iso-8859-2

Question

I am using chardet to detect encoding of text files including Italian. The problem is it consistently detects their encoding as iso-8859-2 while the correct detection would be iso-8859-1. Does anybody know a fix? My local language is set to Polish? Could that influence the detection?

Since iso-8859-2 is for Eastern European languages, I would say that yes, that probably influences the detection. Which method do you use to detect the encoding? — Junuxx, Oct 10 '12 at 15:36
Junuxx - I am using a 'detect' method e.g. chardet.detect(text) — twowo, Oct 10 '12 at 15:49
I recommend reading the accepted answer in this [question](http://stackoverflow.com/questions/436220/python-is-there-a-way-to-determine-the-encoding-of-text-file). — Pedro Romano, Oct 10 '12 at 17:23

score 1 · Accepted Answer · edited May 23 '17 at 11:43

chardet doesn't support iso-8859-1, that's why it's not detecting it. For supported character encodings, see chardets homepage - http://pypi.python.org/pypi/chardet.

I use the Linux program 'file' to get the character encoding of different content, however I'm not sure how safe it is, see my question - Encoding detection in Python, use the chardet library or not?. But it works with great results for me so far.

Btw, your local language should not influence the detection.

Italian dected as iso-8859-2

1 Answers1