1

I am learning to extract data from Bureau of Labor Statistics using their API. The sample code:

import requests
import json
import prettytable
headers = {'Content-type': 'application/json'}
data = json.dumps({"seriesid": ['LAUMT421090000000005'],"startyear":"2011", "endyear":"2014"})
p = requests.post('https://api.bls.gov/publicAPI/v2/timeseries/data/', data=data, headers=headers)
json_data = json.loads(p.text)
for series in json_data['Results']['series']:
    x=prettytable.PrettyTable(["series id","year","period","value","footnotes"])
    seriesId = series['seriesID']
    for item in series['data']:
        year = item['year']
        period = item['period']
        value = item['value']
        footnotes=""
        for footnote in item['footnotes']:
            if footnote:
                footnotes = footnotes + footnote['text'] + ','
       'if 'M01' <= period <= 'M12':'
            x.add_row([seriesId,year,period,value,footnotes[0:-1]])
    output = open(seriesId + '.txt','w')
    output.write (x.get_string())
    output.close()

I simply changed the seriesID to get the data I wanted. And the output the code generated is a text file named after the Series ID.

Text The data extracted is shown as:

Data A fraction of the actual text of the data:

LAUMT421090000000005 | 2014 | M12 | 405757 | Data were subject to revision on April 20, 2018. | | LAUMT421090000000005 | 2014 | M11 | 406061 | Data were subject to revision on April 20, 2018. | | LAUMT421090000000005 | 2014 | M10 | 405358 | Data were subject to revision on April 20, 2018. | | LAUMT421090000000005 | 2014 | M09 | 402164 | Data were subject to revision on April 20, 2018. | | LAUMT421090000000005 | 2014 | M08 | 400534 | Data were subject to revision on April 20, 2018.

Being a total novice in using Python and API, changing part of the source code to get the desirable data is the most I can achieve now. Due to work reasons, I need to present the data in Excel (I know I will be frowned upon, but this is how it is). However, Excel doesn't recognize "|" as the delimiter.

The sample code uses the prettytable library to generate the output data. I'm wondering if there are other ways to extract the data so that the result is easier to process or convert to comma separated values.

Thanks.

Community
  • 1
  • 1
Bowen Liu
  • 1,065
  • 1
  • 11
  • 24

2 Answers2

1

In my code below, I create a dataframe then output it to a .csv file using pandas.DataFrame.to_csv() (link to docs). In this example, I also add try and except clauses to raise an exception if the user enters an invalid Series ID (not the case with this example) or if the user exceeds the number of daily hits to the BLS API (Unregistered users may request up to 25 queries daily - per documentation FAQs).

In this example, I add a column showing the SeriesID which would prove more useful if your list of SeriesIDs contained more than one item and would help distinguish between different series. I also do some extra manipulation to the footnotes column to make it more meaningful in the outputted dataframe.

import pandas as pd
import json
import requests

headers = {'Content-type': 'application/json'}
data = json.dumps({"seriesid": ['LAUMT421090000000005'],"startyear":"2011", "endyear":"2014"})
p = requests.post('https://api.bls.gov/publicAPI/v1/timeseries/data/', data=data, headers=headers)
json_data = json.loads(p.text)
try:
    df = pd.DataFrame()
    for series in json_data['Results']['series']:
        df_initial = pd.DataFrame(series)
        series_col = df_initial['seriesID'][0]
        for i in range(0, len(df_initial) - 1):
            df_row = pd.DataFrame(df_initial['data'][i])
            df_row['seriesID'] = series_col
            if 'code' not in str(df_row['footnotes']): 
                df_row['footnotes'] = ''
            else:
                df_row['footnotes'] = str(df_row['footnotes']).split("'code': '",1)[1][:1]
            df = df.append(df_row, ignore_index=True)
    df.to_csv('blsdata.csv', index=False)
except:
    json_data['status'] == 'REQUEST_NOT_PROCESSED'
    print('BLS API has given the following Response:', json_data['status'])
    print('Reason:', json_data['message'])

If you would like to output a .xlsx file rather than a .csv, simply substitute df.to_csv('blsdata.csv', index=False) with df.to_excel('blsdata.xlsx', index=False, engine='xlsxwriter').

Expected Outputted .csv file:

Expected Outputted .csv

Expected Outputted .xlsx file if you use pandas.DataFrame.to_excel() rather than pandas.DataFrame.to_csv():

Expected Outputted .xlsx

patrickjlong1
  • 3,683
  • 1
  • 18
  • 32
  • What an amazing answer. Sorry it took me this long to respond. This issue didn't arise at work until today. I'm learning so much just by messing around and trying to understand your code piece by piece, like `len(df)` and `try except`. Just two tiny problems with the result: 1. The result won't include the first month in the beginning year. I will try use `for i in range(0, len(df_initial)):`. 2. The footnote column in my result table is empty. But these are small things. – Bowen Liu Sep 26 '18 at 17:54
  • I learned `for loop` and `pandas`, but never combined them together. What you did here really really opened my eyes. Thank you so much for that. Where can I read up more on stuff like this: not only using pandas to do what I normally would do in Excel, but also combine it with Python fundamentals. And in this case, how did you know what columns are in the json_data? Thanks. – Bowen Liu Sep 26 '18 at 17:56
  • I don't know anything about json data, so I am also wondering how ` df_initial = pd.DataFrame(series)` could work. I tried to return `json_data['Results']['series']` to see what it comprises of, but the returned result seems the same as just `json_data` – Bowen Liu Sep 26 '18 at 18:04
0
import requests
import json
import csv
headers = {'Content-type': 'application/json'}
data = json.dumps({"seriesid": ['LAUMT421090000000005'],"startyear":"2011", "endyear":"2014"})
p = requests.post('https://api.bls.gov/publicAPI/v2/timeseries/data/', data=data, headers=headers)
json_data = json.loads(p.text)

After the above lines of code, use the steps describe in the following stackoverflow link for the next steps: How can I convert JSON to CSV?

Hope it helps!

Arihant
  • 735
  • 5
  • 14