CSV parsed with pandas and casted to dict results in KeyError in last row only

Question

I have a CSV file that I’m parsing with the help of pandas and casting to a dict like so: contacts = read_csv(file_handle).to_dict(). So contacts is now a dict, and contains the following keys:

(Pdb) contacts.keys()
['FirstName', 'Title', 'LastName', 'EmailAddress']

There’s a few thousand rows, and I need to access them in some way so I tried many, many different for loops, one is:

for i, name in enumerate(contacts['FirstName'].values()):
    parsed_contacts.append(dict(
        first_name=name,
        last_name=...
    ))

(the above example has been shortened for display purposes) Now this works correctly, until the very last row, on which it throws a KeyError on FirstName:

File "/app/importer/contacts.py", line 290, in parse_contacts
    for i, name in enumerate(contacts['FirstName'].values()):
KeyError: 'FirstName'

this makes no sense to me: it literally works for 4041 rows and fails at row 4042 with a KeyError

Same thing if I access it in some other way:

for i in range(len(contacts['FirstName'].values())):
    parsed_contacts.append(dict(
        first_name=contacts['FirstName'][i],

Not sure what’s going on there, it’s driving me crazy because these work if I run them interactively in pdb:

(Pdb) for i, name in enumerate(contacts['FirstName'].values()): print             
name
John
Juliette
...
nan
Jose
Jesús
Frances
(Pdb) for i in range(len(contacts['FirstName'].values())): print i
0
...
4040
4041

Any ideas why this happens?

Also, why are you converting it to a dict after you convert it to a dataframe? Why not just use csv DictReader if you don't care about pandas? — Cory Madden, Aug 08 '17 at 23:37
Pandas was already available there so it felt like the most straightforward choice, but you're right I'll give DictReader a shot. — FaustoW, Aug 09 '17 at 17:09
DictReader was way better suited for my use case, if your comment @CoryMadden was an answer instead of a comment I'd accept it. — FaustoW, Aug 09 '17 at 17:16

score 1 · Accepted Answer · answered Aug 09 '17 at 06:58

1

This error usually happens when there is a new line at the end of the csv file. So when the csv reader reads the file, it basically picks it up as an empty record with no keys. Hence, a key error is raised. One work around is that you can use dict.get() method to ensure that an error is not raised. You can look for [more examples here] and here 3. Also double check if there is no extra line at the end of the csv file.

answered Aug 09 '17 at 06:58

Gambit1614

8,547
1
25
51

Oh I tried `contacts.get('FirstName', [])` but the outcome was that _none_ of the rows got parsed instead. – FaustoW Aug 09 '17 at 17:11
There seemed to be an extra line at the end of the CSV file, removing it solved the issue. However I moved over to using csv.DictReader instead of pandas. – FaustoW Aug 09 '17 at 17:17

CSV parsed with pandas and casted to dict results in KeyError in last row only

1 Answers1