107

when I do curl to a API call link http://example.com/passkey=wedsmdjsjmdd

curl 'http://example.com/passkey=wedsmdjsjmdd'

I get the employee output data on a csv file format, like:

"Steve","421","0","421","2","","","","","","","","","421","0","421","2"

how can parse through this using python.

I tried:

import csv 
cr = csv.reader(open('http://example.com/passkey=wedsmdjsjmdd',"rb"))
for row in cr:
    print row

but it didn't work and I got an error

http://example.com/passkey=wedsmdjsjmdd No such file or directory:

Thanks!

Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
mongotop
  • 7,114
  • 14
  • 51
  • 76
  • Can you access that domain directly? – brbcoding Apr 29 '13 at 16:37
  • 1
    you need to open the url and read it in as a big text string (see urllib/requests) , then I assume you can initialize the csv reader with a string instead of a file object, but I dont know, Ive always used it with an open filehandle. – Joran Beasley Apr 29 '13 at 16:39
  • @brbcoding, yes. I can get csv file when I put the link on the browser. – mongotop Apr 29 '13 at 16:42
  • @JoranBeasley, I think that your method is correct, maybe I need something like this `http://processing.org/reference/loadStrings_.html` but using python – mongotop Apr 29 '13 at 16:43
  • 7
    FYI: the `read_csv` function in the `pandas` library (http://pandas.pydata.org/) accepts URLs. See http://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.parsers.read_csv.html – Warren Weckesser Apr 29 '13 at 17:35
  • Duplicate of [How do I read and write CSV files with Python?](http://stackoverflow.com/q/41585078/562769) and [Get webpage contents with Python?](http://stackoverflow.com/a/38428249/562769). See [What if a question is an exact duplicate of the conjunction of two other questions](http://meta.stackexchange.com/q/122416/158075) – Martin Thoma Jan 11 '17 at 07:50

9 Answers9

147

Using pandas it is very simple to read a csv file directly from a url

import pandas as pd
data = pd.read_csv('https://example.com/passkey=wedsmdjsjmdd')

This will read your data in tabular format, which will be very easy to process

James Wierzba
  • 16,176
  • 14
  • 79
  • 120
Kathirmani Sukumar
  • 10,445
  • 5
  • 33
  • 34
  • 3
    This is one of the simplest approach I have come across so far! – Jawairia May 17 '18 at 06:21
  • 3
    So long as your CSV file fits into memory, this is okay. – JeffHeaton Apr 20 '19 at 22:23
  • 6
    Didn't work for me, maybe I ran out of memory. `pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 33, saw 2` – J Agustin Barrachina Apr 13 '20 at 15:13
  • is there anyway to use this with a retry, many times i get a 500 error and when i read_csv again it works. this happens a lot when i am reading from google sheets – Dinero Aug 12 '20 at 01:50
  • This answer worked. The other with `csv.reader()` always gave me a `_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)`. – Roland Jun 08 '23 at 23:15
91

You need to replace open with urllib.urlopen or urllib2.urlopen.

e.g.

import csv
import urllib2

url = 'http://winterolympicsmedals.com/medals.csv'
response = urllib2.urlopen(url)
cr = csv.reader(response)

for row in cr:
    print row

This would output the following

Year,City,Sport,Discipline,NOC,Event,Event gender,Medal
1924,Chamonix,Skating,Figure skating,AUT,individual,M,Silver
1924,Chamonix,Skating,Figure skating,AUT,individual,W,Gold
...

The original question is tagged "python-2.x", but for a Python 3 implementation (which requires only minor changes) see below.

TheDudeAbides
  • 1,821
  • 1
  • 21
  • 29
eandersson
  • 25,781
  • 8
  • 89
  • 110
  • 1
    can you pass that to csv_reader ? I guess so ... its pretty "file-like", but I've never done it or even thought to do that – Joran Beasley Apr 29 '13 at 16:45
  • 1
    lol I dunno that I was right I was just asking ... hadn't ever seen that done before – Joran Beasley Apr 29 '13 at 16:47
  • I just assumed that it worked to be honest. Which is crazy as I have used this hundred of times. :D – eandersson Apr 29 '13 at 16:50
  • I think urllib2.urlopen returns a file-like object, so you can probably just remove the `.read()`, and pass `response` to the `csv.reader`. – Dave Challis Apr 29 '13 at 16:50
  • It does, but at least for me I don't get the excepted output. I think its a formating issue. – eandersson Apr 29 '13 at 16:51
  • when I try to output the result `print cr` I get this `<_csv.reader object at 0x8e3db54> ` – mongotop Apr 29 '13 at 16:54
  • 1
    @mongotop that means it is working... That shows you where the object is in memory. Looks like it only reads a line at a time, so maybe `cr.next()` inside a loop is what you are looking for. (haven't used csv reader myself...) – brbcoding Apr 29 '13 at 16:55
  • Like @brbcoding said. I updated my example demonstrating how to display the result. – eandersson Apr 29 '13 at 16:57
  • I got this output: `['>'] ` – mongotop Apr 29 '13 at 16:58
  • no I wasn't but when I did, I got an output but empty ` ['
        Method Not Allowed

    '] ['
    '] ['

    '] ['

    – mongotop Apr 29 '13 at 17:04
  • You did not include the address you are trying to download the data from. It looks like your web server won't allow the request. Try the csv I included in my example. And as an alternative to urllib2 you could try requests as well http://docs.python-requests.org/en/latest/ – eandersson Apr 29 '13 at 17:04
  • first of all Thanks a lot for putting a life example!!1 that is very helpfule, I tried to add csv I got this error,` response = urllib2.urlopen(NewUrlCall+'.csv',"rb").read() File "/usr/lib/python2.6/urllib2.py", line 124, in urlopen return _opener.open(url, data, timeout) File "/usr/lib/python2.6/urllib2.py", line 395, in open response = meth(req, response) File "/usr/lib/python2.6/urllib2.py", line 508, in http_response http_error_default raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) urllib2.HTTPError: HTTP Error 405: Method Not Allowed ` – mongotop Apr 29 '13 at 17:12
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/29128/discussion-between-mongotop-and-eandersson) – mongotop Apr 29 '13 at 17:12
  • please check the chat for more info, – mongotop Apr 29 '13 at 17:20
31

You could do it with the requests module as well:

url = 'http://winterolympicsmedals.com/medals.csv'
r = requests.get(url)
text = r.iter_lines()
reader = csv.reader(text, delimiter=',')
Rodo
  • 1,578
  • 1
  • 14
  • 11
  • 1
    Works like charm! Thank you for submitting you answer! – mongotop Mar 22 '16 at 18:47
  • 5
    One question. The reader variable is a _csv.reader object. When i iterate through this object to print the contents, I get the following error. Error: iterator should return strings, not bytes (did you open the file in text mode?). How do i read contents of the csvreader object and say load it to a pandas dataframe? – Harikrishna Jan 17 '18 at 21:04
  • 1
    @Harikrishna this is probably problem in Python 3 and this case is answered here: https://stackoverflow.com/questions/18897029/read-csv-file-from-url-into-python-3-x-csv-error-iterator-should-return-str – Michal Skop Apr 12 '18 at 01:22
  • This reads the whole thing into memory, not really necessary, especially if you are going to use csv.reader. At this point, just use Pandas. – JeffHeaton Dec 24 '22 at 15:53
31

To increase performance when downloading a large file, the below may work a bit more efficiently:

import requests
from contextlib import closing
import csv

url = "http://download-and-process-csv-efficiently/python.csv"

with closing(requests.get(url, stream=True)) as r:
    reader = csv.reader(r.iter_lines(), delimiter=',', quotechar='"')
    for row in reader:
        # Handle each row here...
        print row   

By setting stream=True in the GET request, when we pass r.iter_lines() to csv.reader(), we are passing a generator to csv.reader(). By doing so, we enable csv.reader() to lazily iterate over each line in the response with for row in reader.

This avoids loading the entire file into memory before we start processing it, drastically reducing memory overhead for large files.

The Aelfinn
  • 13,649
  • 2
  • 54
  • 45
  • 2
    This is one great solution! Thank you @The Aelfinn! – mongotop Jul 31 '16 at 21:57
  • 15
    Great solution, but I had to also `import codecs` and wrap the `r.iter_lines()` within `codecs.iterdecode()` like so: `codecs.iterdecode(r.iterlines(), 'utf-8')` ... in order to solve `byte` vs `str` issues, unicode decoding problems and universal new line problems. – Irvin H. Mar 23 '17 at 15:22
  • I was looking for a solution like this, with requests. – Save Sep 06 '20 at 17:40
  • I like this solution a lot – mit Dec 22 '21 at 09:50
30

This question is tagged python-2.x so it didn't seem right to tamper with the original question, or the accepted answer. However, Python 2 is now unsupported, and this question still has good google juice for "python csv urllib", so here's an updated Python 3 solution.

It's now necessary to decode urlopen's response (in bytes) into a valid local encoding, so the accepted answer has to be modified slightly:

import csv, urllib.request

url = 'http://winterolympicsmedals.com/medals.csv'
response = urllib.request.urlopen(url)
lines = [l.decode('utf-8') for l in response.readlines()]
cr = csv.reader(lines)

for row in cr:
    print(row)

Note the extra line beginning with lines =, the fact that urlopen is now in the urllib.request module, and print of course requires parentheses.

It's hardly advertised, but yes, csv.reader can read from a list of strings.

And since someone else mentioned pandas, here's a pandas rendition that displays the CSV in a console-friendly output:

python3 -c 'import pandas
df = pandas.read_csv("http://winterolympicsmedals.com/medals.csv")
print(df.to_string())'

Pandas is not a lightweight library, though. If you don't need the things that pandas provides, or if startup time is important (e.g. you're writing a command line utility or any other program that needs to load quickly), I'd advise that you stick with the standard library functions.

TheDudeAbides
  • 1,821
  • 1
  • 21
  • 29
  • 1
    Thank you @ThedudeAbides for providing an updated solution! – mongotop Jun 29 '20 at 15:07
  • 1
    Just want to add that `import pandas` alone will be an order of magnitude slower than any other solution on this page. So don't go `pip install pandas` JUST because you see that you can do a cool one-liner with it; it also brings in numpy as a dependency, and it's all downhill from there. Same goes for `import requests`, although not to such a degree. – TheDudeAbides Apr 13 '21 at 01:13
10
import pandas as pd
url='https://raw.githubusercontent.com/juliencohensolal/BankMarketing/master/rawData/bank-additional-full.csv'
data = pd.read_csv(url,sep=";") # use sep="," for coma separation. 
data.describe()

enter image description here

user2458922
  • 1,691
  • 1
  • 17
  • 37
  • With python 3.8: Exception has occurred: AttributeError module 'pandas' has no attribute 'describe' – MiKK Apr 25 '22 at 13:51
4

I am also using this approach for csv files (Python 3.6.9):

import csv
import io
import requests

r = requests.get(url)
buff = io.StringIO(r.text)
dr = csv.DictReader(buff)
for row in dr:
    print(row)
Michal Skop
  • 1,349
  • 1
  • 15
  • 23
1

All the above solutions didn't work with Python3, I got all the "famous" error messages, like _csv.Error: iterator should return strings, not bytes (did you open the file in text mode?) and _csv.Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?. So I was a bit stuck here.

My mistake here, was that I used response.text while response is a requests.models.Response class, while I should have used response.content instead (as the first error suggested), so I was able to decode its UTF-8 correctly and split lines afterwards. So here is my solution:

import csv
import reqto

response = reqto.get("https://example.org/utf8-data.csv")
# Do some error checks to avoid bad results
if response.ok and len(response.content) > 0:
    reader = csv.DictReader(response.content.decode('utf-8').splitlines(), dialect='unix')
    for row in reader:
        print(f"DEBUG: row={row}")

The above example gives me already a dict back with each row. But with leading # for each dict key, which I may have to live with.

Roland
  • 184
  • 1
  • 14
0

what you were trying to do with the curl command was to download the file to your local hard drive(HD). You however need to specify a path on HD

curl http://example.com/passkey=wedsmdjsjmdd -o ./example.csv
cr = csv.reader(open('./example.csv',"r"))
for row in cr:
    print row