0

I write a list into a csv file using unicodecsv module, encoding it by "utf-8", but when I try to read it using unicodecsv.reader, I still get the error:UnicodeDecodeError: 'utf8' codec can't decode byte.... I can read it in by csv.reader. Is there something that I am missing?

My codes are like this:

    with open(datapath + filename, 'wb') as csvfile:
        writer_to_csv = unicodecsv.writer(csvfile, encoding = "utf-8")
        writer_to_csv.writerows(data)

When I try to read it:

   with open(datapath + filename, 'rb') as csvfile:
        file_to_list = unicodecsv.reader(csvfile, encoding = "utf-8")

I got the error message.

Zhen Sun
  • 817
  • 3
  • 13
  • 20
  • What does `data` look like? – Simeon Visser Nov 27 '14 at 23:22
  • Does this help: http://stackoverflow.com/questions/21479589/unicodecsv-reader-from-unicode-string-not-working ? – Simeon Visser Nov 27 '14 at 23:28
  • @SimeonVisser, thanks. I saw that answer, but I already encoded the data in ``utf8``, that's why I don't understand the problem. Moreover, if I use csv.reader in the second part, it works with no problem. Is there anything special about ``unicodecsv`` that I am missing here? The data are parsed from a large xml file, just for your information. – Zhen Sun Nov 27 '14 at 23:31
  • 1
    If it can't read the file as UTF-8 then perhaps it isn't written properly as UTF-8. So although you get the error when reading the real error may happen when writing. So that's why I'm thinking you may be writing `data` incorrectly and it doesn't have the type/format that `writer_to_csv.writerows` expects. – Simeon Visser Nov 27 '14 at 23:47
  • @SimeonVisser, I think you are right. I can use encoding = "ISO 8859-1" to read the data. So I cannot really forcefully to write the data as "utf8" encoded? – Zhen Sun Nov 27 '14 at 23:54
  • 1
    If the data is currently encoded in ISO 8859-1 then you'll need to decode it to Unicode first and then encode the data to UTF-8 when passing it to write. However, I'm not sure whether that is possible so you may need to use ISO 8859-1 for both writing and reading. – Simeon Visser Nov 28 '14 at 00:14

0 Answers0