How to read several rows from a csv

Question

I have a csv file which contains among other things the names and the phone numbers. I'm only interested in a name only if I've its phone number.

with open(phone_numbers) as f:
    reader = csv.DictReader(f)
    names =  [record['Name'] for record in reader if record['phone']]

But I also want the respective phone number, I've try this:

user_data = {}
with open(phone_numbers) as f:
    reader = csv.DictReader(f)
    user_data['Name'] =  [record['Name'] for record in reader if record['phone']]
    user_data['phone'] = [record['phone'] for record in reader if record['phone']]

But for the second item I got an empty string, I'm guessing that record is a generator and that's why I can iterate over it twice.

I've try to use tuples, but only had worked this way:

user_data = {}
with open(phone_numbers) as f:
    reader = csv.DictReader(f)
    user_data['Name'] =  [(record['Name'],record['phone']) for record in reader if record['phone']]

In that case I have the two variables, phone and Name stored in user_data['Name'], that isn't what I want.

And if I try this:

user_data = {}
with open(phone_numbers) as f:
    reader = csv.DictReader(f)
    user_data['Name'],user_data['phone'] =  [(record['Name'],record['phone']) for record in reader if record['phone']]

I got the following error:

ValueError: too many values to unpack

Edit:

This is a sample of the table:

+--------+---------------+
| Phone | Number |
+--------+---------------+
| Luis | 000 111 22222 |
+--------+---------------+
| Paul | 000 222 3333 |
+--------+---------------+
| Andrea | |
+--------+---------------+
| Jorge | 111 222 3333 |
+--------+---------------+

So all rows have a Name but not all have phones.

Check this answer http://stackoverflow.com/questions/5466618/too-many-values-to-unpack-iterating-over-a-dict-key-string-value-list — renno, Apr 03 '16 at 21:46
Can you clarify is your data one column or multiple and if multiple are the phone number and name in the same row? — PyNEwbie, Apr 03 '16 at 21:54
I think you are indicating that phone and name are in the same row — PyNEwbie, Apr 03 '16 at 21:55
@PyNEwbie It's multiple column, and all rows have name but not all have phone number that's way I'm using if record['phone'] — Luis Ramon Ramirez Rodriguez, Apr 03 '16 at 21:57
@YakymPirozhenko you mean like this: zip(record['Name'],record['phone']) ? didn't work — Luis Ramon Ramirez Rodriguez, Apr 03 '16 at 21:58
No, I mean `zip(*[(record['Name'],record['phone']) for record in reader if record['phone']])`. — hilberts_drinking_problem, Apr 03 '16 at 21:59
@YakymPirozhenko You were right zip(*), with the '*' does the work. — Luis Ramon Ramirez Rodriguez, Apr 03 '16 at 22:37
Glad to help. Also consider itertools.izip or generator to reduce memory footprint. — hilberts_drinking_problem, Apr 03 '16 at 22:48

oz123 · Answer 1 · 2016-04-03T22:13:27.820

Your guess is quite right. If this is the approach you want take - iteration twice, you should use seek(0)

reader = csv.DictReader(f)
user_data['Name'] =  [record['Name'] for record in reader if record['phone']]
f.seek(0)   # role back to begin of file ...
reader = csv.DictReader(f)
user_data['phone'] = [record['phone'] for record in reader if record['phone']]

However, this is not very efficient and you should try and get your data in one roll. The following should do it in one roll:

user_data = {}

def extract_user(user_data, record):
    if record['phone']:
        name = record.pop('name')
        user_data.update({name: record})

[extract_user(user_data, record) for record in reader]

Example:

In [20]: cat phones.csv
name,phone
hans,01768209213
grettel,
henzel,123457123

In [21]: f = open('phones.csv')

In [22]: reader = csv.DictReader(f)

In [24]: %paste
user_data = {}

def extract_user(user_data, record):
    if record['phone']:
        name = record.pop('name')
        user_data.update({name: record})

[extract_user(user_data, record) for record in reader]

## -- End pasted text --
Out[24]: [None, None, None]

In [25]: user_data
Out[25]: {'hans': {'phone': '01768209213'}, 'henzel': {'phone': '123457123'}}

I am not sure this is the problem - the last block reopens the file. The issue is with unpacking. — hilberts_drinking_problem, Apr 03 '16 at 21:49

PyNEwbie · Answer 2 · 2016-04-03T22:02:50.050

1

I think there is a much easier approach Because it is a csv file since there are column headings as you indicate then there is a value for phone in each row, it is either nothing or something - so this tests for nothing and if not nothing adds the name and phone to user_data

import csv
user_data = []
with open(f,'rb') as fh:
   my_reader = csv.DictReader(fh)
   for row in my_reader:
       if row['phone'] != ''
           user_details = dict()
           user_details['Name'] = row['Name']
           user_details['phone'] = row['phone']
           user_data.append(user_details)

By using DictReader we are letting the magic happen so we don't have to worry about seek etc.

If I did not understand and you want a dictionary then easy enough

import csv
user_data = dict()
with open(f,'rb') as fh:
   my_reader = csv.DictReader(fh)
   for row in my_reader:
       if row['phone'] != ''
           user_data['Name'] = row['phone']

edited Apr 03 '16 at 22:02

answered Apr 03 '16 at 21:59

PyNEwbie

4,882
4
38
86

the OP wanted a dictionary as final result, your construct will give him a list of dictionaries – oz123 Apr 03 '16 at 22:01
Thanks I am still not clear but both options will work – PyNEwbie Apr 03 '16 at 22:03
@PyNEwbie I've try your second code, I got one phone number assigned to a name, but I want with the name and the phone, if the phone exist. Also for some reason I'm only getting the value of one row, the file has several rows. – Luis Ramon Ramirez Rodriguez Apr 03 '16 at 22:23
1

@Luis getting the value of one row probably because python dicts don't support duplicate keys - the last one wins. If you need duplicate keys, possible workarounds here: http://stackoverflow.com/questions/10664856/make-dictionary-with-duplicate-keys-in-python – kjarsenal Apr 03 '16 at 22:49

gboffi · Answer 3 · 2016-04-03T22:31:48.953

Is it possible that what you're looking for is throwing away some info in your data file?

In [26]: !cat data00.csv
Name,Phone,Address
goofey,,ade
mickey,1212,heaven
tip,3231,earth

In [27]: f = open('data00.csv')

In [28]: r = csv.DictReader(f)

In [29]: lod = [{'Name':rec['Name'], 'Phone':rec['Phone']} for rec in r if rec['Phone']]

In [30]: lod
Out[30]: [{'Name': 'mickey', 'Phone': '1212'}, {'Name': 'tip', 'Phone': '3231'}]

In [31]:

On the other hand, should your file contain ONLY Name and Phone columns, it's just

In [31]: lod = [rec for rec in r if rec['Phone']]

score 1 · Accepted Answer · answered Apr 03 '16 at 22:18

1

You can use dict to convert your list of tuple into dictionary. Also you need to use get if you have record without phone value.

import csv

user_data = {}
with open(phone_numbers) as f:
    reader = csv.DictReader(f)
    user_data = dict([(record['Name'], record['phone']) for record in reader if record.get('phone').strip())

If you want a list of names and phones separately you can use the * expression

with open(phone_numbers) as f:
    reader = csv.DictReader(f)
    names, phones = zip(*[(record['name'], record['value']) for record in reader if record.get('phone').strip()])

answered Apr 03 '16 at 22:18

styvane

59,869
19
150
156

thanks, both worked. Will the dict approach work for more than two items? also it takes the first value as the key, this means that it will broke if there are duplicate values? – Luis Ramon Ramirez Rodriguez Apr 03 '16 at 22:41
1

@Yes it will work work for more than two items. It will not break if you have duplicate `name` in you file but only the last value will be maintain. The best thing to do if you have a duplicate key, I mean `name` is keep your result as list of `tuple`. BTW that is what `tuple` is used. Also don't forget to accept the answer if it helped. – styvane Apr 03 '16 at 22:52

score 1 · Answer 5 · answered Apr 03 '16 at 22:38

1

I normally use row indexing:

input = open('mycsv.csv', 'r')
user_data = {}

for row in csv.reader(input):
    if row[<row # containing phone>]:
        name = row[<row # containing name>]
        user_data[name] = row[<row # containing phone>]

answered Apr 03 '16 at 22:38

kjarsenal

934
1
12
35

score 1 · Answer 6 · answered Apr 05 '16 at 16:29

You were correct the whole time, except for the unpacking.

result = [(record["name"], record["phone"]) for record in reader if record["phone"]]
# this gives [(name1, phone1), (name2,phone2),....]

You have to do [dostuff for name, phone in result] not name,phone = result, which does not make sense semantically and syntactically.

How to read several rows from a csv

6 Answers6