Python - CSV to dict - unicode characters in keys

Question

I implemented the follwoing csv to dict method:

def csv_to_dict(path):
    res = {}
    with open(path, newline='') as csv_file:
        reader = csv.DictReader(csv_file)
        for row in reader:
            for key, value in row.items():
                if key in res:
                    res[key].append(value)
                else:
                    res[key] = [value]

    return res

This methods works as desired, except an issue If I take the following input .csv

Lfd. Nr.,Parameter 1,Parameter 2
1,2,3
1,2,3

The output of the above function is on MAc OS:

{'\ufeffLfd. Nr.': ['1', '1'], 'Parameter 1': ['2','2'], 'Parameter 2' : ...

On Windows I get the following output:

{"u00efu00bbu00bfLfd. Nr.": ['1', '1'], 'Parameter 1': ['2','2'], 'Parameter 2' : ...

How to get rid off these characters in the first key? Why are these there?

Desired output:

{'Lfd. Nr.': ['1', '1'], 'Parameter 1': ['2','2'], 'Parameter 2' : ...

UPDATE my csv file is encoded in utf-8.

I also tried (without success):

with open(path, newline='', encoding='utf-8') as csv_file:

Solution based on marked link to SO question:

with open(path,"r", newline='',encoding='utf-8-sig') as csv_file

The `utf-8` encoding does not skip the byte order mark. See the dupe for the encoding that will. Then proceed to hate Microsoft for trying to mainstream the bastardized version of UTF-8 that includes a byte order mark. — jpmc26, May 21 '18 at 17:35

Axecalever · Answer 1 · 2018-05-21T17:30:15.130

-1

In place of key variable write str(key) as it is used in res[key] inside inner loop of the function

edited May 21 '18 at 17:30

answered May 21 '18 at 17:05

Axecalever

69
5

I'm getting an error im my IDE: unresolved reference. Is it suited for python 3? Do I have to import something? – d4rty May 21 '18 at 17:09
Its is for upto python 2.7 use str(key, 'utf-8') instead for python 3.x – Axecalever May 21 '18 at 17:15
It says at each place: `TypeError: decoding str is not supported` – d4rty May 21 '18 at 17:25
My Bad. In Python3 a str is unicode, ie. it is "decoded" so use str(key) – Axecalever May 21 '18 at 17:29
This is wrong. See the dupe target. – jpmc26 May 21 '18 at 17:35

Python - CSV to dict - unicode characters in keys

1 Answers1