43

I'm using Python 2.7.12. With this code snippet I'm saving a utf-8 csv file. I wrote the BOM (byte order mark) at the beginning of the file.

import codecs
import csv

outputFile = open("test.csv", "wb")
outputFile.write(codecs.BOM_UTF8)
fieldnames = ["a", "b"]
writer = csv.DictWriter(outputFile, fieldnames, delimiter=";")
writer.writeheader()
row = dict([])
for i in range(10):
    row["a"] = str(i).encode("utf-8")
    row["b"] = str(i*2).encode("utf-8")
    writer.writerow(row)
outputFile.close()

I want to load that csv file:

import codecs
import csv
inputFile = open("test.csv", "rb")
reader = csv.DictReader(inputFile, delimiter=";")
for row in reader:
    print row["a"]
inputFile.close()

The above code is going to fail: KeyError: 'a' If I print the row keys this is how they look: [u'\ufeffa', u'b']. The BOM has been embedded into the key a. What am I doing wrong?

gbajson
  • 1,531
  • 13
  • 32
Davide_sd
  • 10,578
  • 3
  • 18
  • 30

2 Answers2

65

You have to tell open that this is UTF-8 with BOM. I know that works with io.open:

import io

.
.
.
inputFile = io.open("test.csv", "r", encoding='utf-8-sig')
.
.
.

And you have to open the file in text mode, "r" instead of "rb".

hvwaldow
  • 1,296
  • 1
  • 11
  • 13
  • Actually, I just discovered that your answer work nice only if there aren't special character (à, è, ì, ...), otherwise we'll get the UnicodeEncodeError. Do you know if it's possible to improve your answer? – Davide_sd Oct 28 '16 at 22:31
  • 7
    Oh yes. That is a different issue. csv.Reader doesn't know about UTF-8 [https://docs.python.org/2/library/csv.html#csv-examples](https://docs.python.org/2/library/csv.html#csv-examples) `reader = csv.DictReader((l.encode('utf-8') for l in inputFile), delimiter=";")` should do the trick for you: The input-file replaced by a generator das does the encoding. – hvwaldow Oct 28 '16 at 23:48
  • Top!!! Thank you very much!!! :) You made my day with that pythonic line of code :D – Davide_sd Oct 29 '16 at 08:10
  • Didn't work in Python 3.6 when reading with a `csv.DictReader` – Dagrooms Jan 29 '18 at 21:16
  • 1
    Thank you for this answer! It worked for me with Python 3.7 with a csv.DictReader. I spent hours googling this issue before finding this answer. Wasn't aware there was a BOM encoding option: utf-8-sig. Thanks! – rcronk Oct 05 '18 at 05:03
  • Added bonus is that using utf-8-sig encoding also works for files without the bom, i.e. files that are utf-8 encoded – Tony B Mar 29 '23 at 14:39
23

In Python 3, the built-in open function is an alias for io.open.

All you need to open a file encoded as UTF-8 with BOM:

open(path, newline='', encoding='utf-8-sig')

Example

import csv

...

with open(path, newline='', encoding='utf-8-sig') as csv_file:
    reader = csv.DictReader(csv_file, dialect='excel')
    for row in reader:
        print(row['first_name'], row['last_name'])
Community
  • 1
  • 1
Christopher Peisert
  • 21,862
  • 3
  • 86
  • 117