1

I've got an interesting problem. I get a report per email and parse the CSV with csv.DictReader like so:

with open(extracted_report_uri) as f:
    reader = csv.DictReader(f)
    for row in reader:
        report.append(row)

Unfortunately the CSV contains one column called "eCPM (€)" which leaves me with a list like so: {'eCPM (€)': '1.42'}

Python really does not like a print(report[0]['eCPM (€)']) as it refuses to accept the Euro-sign as a key.

I tried creating an unicode string with the € inside and use that as the key but this also doesnt work. I'd either like to access the value (obviously) as is, or simply get rid of the €.

The suggested duplicates answer is covering the topic of removing BOM rather than accessing my key. I also tried it via report[0][u'eCPM (€)'] as suggested in the comments there. Does not work. KeyError: 'eCPM (�)'

The suggestion from the comment also doesn't work for me. Using report[0][u'eCPM (%s)' % '€'.encode('unicode-escape')] results in KeyError: "eCPM (b'\\\\u20ac')"

rikaidekinai
  • 304
  • 2
  • 10
  • 1
    Possible duplicate of [Reliable way of handling non-ASCII characters in Python?](http://stackoverflow.com/questions/31276483/reliable-way-of-handling-non-ascii-characters-in-python) – R Nar Nov 18 '15 at 17:27
  • That key works for me in IDLE. What error do you get? – user193661 Nov 18 '15 at 18:39
  • @user193661: Exactly what I wrote, the KeyError. I am using PyCharm set to CRLF/UTF-8 under Win7 for development. That might be a part of the problem. The console also shows the € in the command as � like so: `print(reports[0][0][u'eCPM (%s)' % '�'.encode('unicode-escape')]) KeyError: "eCPM (b'\\\\u20ac')"` – rikaidekinai Nov 19 '15 at 09:36

1 Answers1

1

After some more research I found out how to properly do this it seems. As I've seen all sorts of issues on Google/Stackoverflow with BOM/UTF-8 and DictReader here's the complete code:

Situation: You got a CSV file that has Byte Order Mark (BOM)0xEF,0xBB,0xBF with special characters like €äöµ@ or similar in the fieldname and want to read it properly to access the key:value pairs lateron.

In my example the CSV has a fieldname eCPM (€) and this' how it works:

import csv
report = []

with open('test.csv', encoding='utf-8-sig') as f:
    reader = csv.DictReader(f)
    for row in reader:
        report.append(row)

print(report[0][u'eCPM (€)'])

Before this solution I removed the BOM with a function, but there's really no need for this. If you use open() with encoding='utf-8-sig it'll automagically handle the BOM correctly and properly encode the whole file.

And with [u'€'] you can easily access the values of the generated list unicode style.

Thanks for the comments that brought me on the right track!

rikaidekinai
  • 304
  • 2
  • 10