Why does the parsing of csv file not break things into their 'logical' order?

Question

I'm trying to parse a csv file using the built in python csv reader as follows:

sms_prices_list_url = "http://www.twilio.com/resources/rates/international-sms-rates.csv"
sms_prices_list = requests.get(sms_prices_list_url)
reader = csv.reader(sms_prices_list.text)
for row in reader:
    print row

however when I do this almost everything is printed per character, rather than per dict item or column item, e.g.:

['C']
['o']
['u']
['n']
['t']
['r']
['y']
['', '']
[' ']
['N']
['a']
['m']
['e']
['', '']
[' ']
['R']
['a']
['t']
['e']
[]
['', '']
['UNITED STATES Inbound SMS - Other']
['', '']
['0']

How can I separate these entries into a list of dictionaries?

If you use ```urllib.urlopen``` it will give you a file like object to read from instead of a string. — korylprince, Jun 09 '13 at 08:01
@korylprince Sure, that's exactly what `csv.reader` expects. — kirelagin, Jun 09 '13 at 08:03
A request response has an `iter_lines` method, which returns a by-line-iterator, so `reader = csv.reader(sms_prices_list.iter_lines())` works. — Thomas Fenzl, Jun 09 '13 at 08:33

BrenBarn · Answer 1 · 2013-06-09T18:35:56.960

7

csv.reader expects its argument to yield one line of text at a time. You are iterating over a string, which yields one character at a time. Change it to:

reader = csv.reader(sms_prices_list.iter_lines())

Note that this won't give you a list of dictionaries, but an iterable of lists, since that's what csv.reader is meant to give you. Also, it may break if the input is in a Unicode encoding other than UTF-8; see the documentation for some hints about that.

edited Jun 09 '13 at 18:35

answered Jun 09 '13 at 08:00

BrenBarn

242,874
37
412
384

I feel like this should also be filed as a bug against csv. His usage is exactly something that I would expect to work. – U2EF1 Jun 09 '13 at 08:07
3

It's not a bug, since the documentation tells you it's supposed to work exactly the way it does. It could be a feature request, I suppose, but since it's already worked this way for years, I doubt the Python community would be particularly willing to change it. – David Z Jun 09 '13 at 08:12
1

Since your answer is the one rating highest, would you update it to use `iter_lines`? Shorter and at least as efficient. – Thomas Fenzl Jun 09 '13 at 08:36
1

[csv reader will break on non-ascii characters if the input is Unicode in Python 2](http://ideone.com/qusHb5). You might need `.text.encode("utf-8").splitlines()` or `.content.splitlines()` if input is already utf-8. `.iter_lines()` might need `stream=True` and Unicode -> utf-8 translating. – jfs Jun 09 '13 at 10:23

Burhan Khalid · Accepted Answer · 2013-06-09T10:38:49.153

Since you are passing it the raw text, it is splitting on each character as it iterates over it. Use StringIO to get around this:

import StringIO
import csv
import requests

r = requests.get('http://www.twilio.com/resources/rates/international-sms-rates.csv')
reader = csv.DictReader(StringIO.StringIO(r.text))
row = next(reader) # get the next row
print(row)

The above will give you:

{'Country': '', ' Rate': '0.010', ' Name': 'UNITED STATES Inbound SMS - Other'}

You can now loop through it:

for row in reader:
    print(row)
    # do whatever with row

Final thought, if you need a list of dictionaries, you don't need a loop:

reader = csv.DictReader(StringIO.StringIO(r.text))
list_of_dicts = list(reader)

score 2 · Answer 3 · edited May 23 '17 at 11:57

2

You can use StringIO to pass a string to csv.reader, as described in this answer.

edited May 23 '17 at 11:57

Community

1
1

answered Jun 09 '13 at 08:02

Pavel Strakhov

39,123
5
88
127

score 2 · Answer 4 · answered Jun 09 '13 at 08:04

2

To be more efficient and fix your problem use urllib.urlopen:

import urllib, csv
sms_prices_list_url = "http://www.twilio.com/resources/rates/international-sms-rates.csv"
sms_prices_list = urllib.urlopen(sms_prices_list_url)
reader = csv.reader(sms_prices_list)
for row in reader:
    print row

answered Jun 09 '13 at 08:04

korylprince

2,969
1
18
27

Why is this more efficient than using requests? – Thomas Fenzl Jun 09 '13 at 08:35
Because the csv reader reads the data directly from the website (line by line). The others load the entire text into memory then split it into a list or wrap it into an object. If the csv was incredibly large, the other methods could fill up memory. In my opinion, this is the most pythonic way to do it. – korylprince Jun 09 '13 at 08:38
I'd use `response.iter_lines`, the only advantage I see in using urllib is that it's standard library... – Thomas Fenzl Jun 09 '13 at 08:50
well you can also do read the content of an URL using a stream using requests: [Body Content Workflow](http://www.python-requests.org/en/latest/user/advanced/#body-content-workflow) from requests documentation. – zmo Jun 09 '13 at 12:27

Why does the parsing of csv file not break things into their 'logical' order?

4 Answers4