0

I want to download data from https://download.bls.gov/pub/time.series/ln/ln.data.1.AllData to dataframe.

I have tried below script, but could not succeeded.

import requests, io
import pandas as pd

URL = 'https://download.bls.gov/pub/time.series/ln/ln.data.1.AllData'

#1
urlData = requests.get(URL).content
rawData = pd.read_csv(io.StringIO(urlData.decode('utf-8')))
print(len(rawData))

Error: Python IDLE Got Stuck


#2
r = requests.get(URL)  
urlData = pd.read_csv(io.StringIO(r))
print(len(urlData))

Error:-
urlData = pd.read_csv(io.StringIO(r))
TypeError: initial_value must be str or None, not Response

#3
urlData = pd.read_csv(URL, header=None)
print(len(urlData))
Learnings
  • 2,780
  • 9
  • 35
  • 55
  • Possible duplicate of [https://stackoverflow.com/questions/7243750/download-file-from-web-in-python-3](https://stackoverflow.com/questions/7243750/download-file-from-web-in-python-3) – Alistair Carscadden Aug 09 '18 at 05:20
  • 1
    @AlistairCarscadden How is that a duplicate? One question is about using `requests` and feeding the result to `pandas`; the other is about using `httplib2`. – abarnert Aug 09 '18 at 05:23
  • What does "could not succeeded" mean? What happens? An exception? Then paste the exception here. No error, and `rawData` isn't empty, but doesn't have the results you expected? Then show us what you expected and what you got. If you don't tell us what the problem is, we can't debug it. – abarnert Aug 09 '18 at 05:24
  • 1
    When I run your #1, it takes a long time, but it prints out `6314507`, which seems to be the right answer. – abarnert Aug 09 '18 at 05:28
  • @abarnert, thanks for the reply. In second script I got error "TypeError: initial_value must be str or None, not Response" – Learnings Aug 09 '18 at 05:38
  • 1
    @SPy Yes, your second script is wrong. But that doesn't change the fact that your first script works. – abarnert Aug 09 '18 at 05:46
  • @abarnert, I agree... First OK now.. thanks – Learnings Aug 09 '18 at 05:49

3 Answers3

3

I got this working with

import requests, io
import pandas as pd

URL = 'https://download.bls.gov/pub/time.series/ln/ln.data.1.AllData'

#1
urlData = requests.get(URL).content
rawData = pd.read_csv(io.StringIO(urlData.decode('utf-8')), sep="\t")
print(rawData.head())
print(rawData.info())
M.Rau
  • 764
  • 5
  • 10
2

Simplest way is to use urllib2.

import urllib2
url_name = 'http://abc.pdf'
response = urllib2.urlopen(url_name)
file = open(url_name.split('//')[1], 'w')
file.write(response.read())
file.close()
Harshita
  • 183
  • 1
  • 10
0

I tried to download the data through the URL, and it does take a very long time. I recommend you to download through wget and then process it. The script itself seems fine.

Martin Liu
  • 107
  • 4