Download data from URL in Python 3.6

Question

I want to download data from https://download.bls.gov/pub/time.series/ln/ln.data.1.AllData to dataframe.

I have tried below script, but could not succeeded.

import requests, io
import pandas as pd

URL = 'https://download.bls.gov/pub/time.series/ln/ln.data.1.AllData'

#1
urlData = requests.get(URL).content
rawData = pd.read_csv(io.StringIO(urlData.decode('utf-8')))
print(len(rawData))

Error: Python IDLE Got Stuck


#2
r = requests.get(URL)  
urlData = pd.read_csv(io.StringIO(r))
print(len(urlData))

Error:-
urlData = pd.read_csv(io.StringIO(r))
TypeError: initial_value must be str or None, not Response

#3
urlData = pd.read_csv(URL, header=None)
print(len(urlData))

Possible duplicate of [https://stackoverflow.com/questions/7243750/download-file-from-web-in-python-3](https://stackoverflow.com/questions/7243750/download-file-from-web-in-python-3) — Alistair Carscadden, Aug 09 '18 at 05:20
@AlistairCarscadden How is that a duplicate? One question is about using `requests` and feeding the result to `pandas`; the other is about using `httplib2`. — abarnert, Aug 09 '18 at 05:23
What does "could not succeeded" mean? What happens? An exception? Then paste the exception here. No error, and `rawData` isn't empty, but doesn't have the results you expected? Then show us what you expected and what you got. If you don't tell us what the problem is, we can't debug it. — abarnert, Aug 09 '18 at 05:24
When I run your #1, it takes a long time, but it prints out `6314507`, which seems to be the right answer. — abarnert, Aug 09 '18 at 05:28
@abarnert, thanks for the reply. In second script I got error "TypeError: initial_value must be str or None, not Response" — Learnings, Aug 09 '18 at 05:38
@SPy Yes, your second script is wrong. But that doesn't change the fact that your first script works. — abarnert, Aug 09 '18 at 05:46

score 3 · Accepted Answer · answered Aug 09 '18 at 05:34

I got this working with

import requests, io
import pandas as pd

URL = 'https://download.bls.gov/pub/time.series/ln/ln.data.1.AllData'

#1
urlData = requests.get(URL).content
rawData = pd.read_csv(io.StringIO(urlData.decode('utf-8')), sep="\t")
print(rawData.head())
print(rawData.info())

score 2 · Answer 2 · answered Aug 09 '18 at 07:30

2

Simplest way is to use urllib2.

import urllib2
url_name = 'http://abc.pdf'
response = urllib2.urlopen(url_name)
file = open(url_name.split('//')[1], 'w')
file.write(response.read())
file.close()

answered Aug 09 '18 at 07:30

Harshita

183
1
10

thanks, this also helps for some other requirements. – Learnings Aug 09 '18 at 10:00

score 0 · Answer 3 · answered Aug 09 '18 at 05:31

0

I tried to download the data through the URL, and it does take a very long time. I recommend you to download through wget and then process it. The script itself seems fine.

answered Aug 09 '18 at 05:31

Martin Liu

107
4

Download data from URL in Python 3.6

3 Answers3