How to parse JSON from URL and download CSV files?

Question

I'm given a URL which contains some JSON text. In the text there are URL's for csv files. I'm trying to parse the JSON from the URL and download the CSV files. I am able to print out the JSON from the URL but do not know how to grab the CSV files from within.

import urllib, json
import urllib.request
with urllib.request.urlopen("http://staging.test.com/api/reports/68.json?auth_token=test") as url:
    s = url.read()
print(s)

The above will print the JSON from the URL ( see below printout), there are URL's for csv files that I then need to download using python.

{"id":68,"name":"Carrier Rates","state":"complete","user_id":166,"data_set_id":7,"bounding_date":{"id":101,"start_date":"2019-01-01T00:00:00.000-05:00","end_date":"2999-12-31T00:00:00.000-05:00","bounding_field_id":322,"related_id":68,"related_type":"Reports::Report"},"results":[{"id":68,"created_at":"2019-07-26T15:29:40.872-04:00","version_name":"07/26/2019 03:29PM","content":"https://test-staging.s3.amazonaws.com/reports/manufacturer/carrier-test.1dec2e6d-0c36-44b7-ab26-fd43fe710daf.csv"},{"id":67,"created_at":"2019-07-26T15:29:07.112-04:00","version_name":"07/26/2019 03:29PM","content":"https://test-staging.s3.amazonaws.com/reports/manufacturer/carrier-test.3b02195e-c0a2-4abe-88f7-27d20ac76e07.csv"},{"id":35,"created_at":"2019-06-26T11:01:26.900-04:00","version_name":"06/26/2019 11:01AM","content":"https://test-staging.s3.amazonaws.com/reports/manufacturer/carrier-test.a488c58d-5e04-4c28-a429-7167e9e8edaa.csv"},{"id":34,"created_at":"2019-06-26T10:57:51.396-04:00","version_name":"06/26/2019 10:57AM","content":"https://cloudtestlogistics-staging.s3.amazonaws.com/reports/manufacturer/carrier-test.bf73db19-5604-4a1d-bc31-da6cf25742cc.csv"}]}

Please let us know if you have tried anything to parse the JSON text., How far did you get? — Joooeey, Mar 19 '20 at 22:22
Looks like the post from Muhammad Danial Khan below will work however I am getting the error AttributeError: 'bytes' object has no attribute 'read' , not sure if im missing library or something — user2610930, Mar 19 '20 at 22:43
What is the issue, exactly? Have you tried anything, done any research? Stack Overflow is not a free code writing service, nor is it a guide/tutorial resource. Please see [ask], [help/on-topic]. — AMC, Mar 20 '20 at 02:59
my initial post is what ive already tried, I was able to pull the json from the URL however I needed some guidance on downloading the links to the csv's in the json. — user2610930, Mar 20 '20 at 12:20

score 0 · Answer 1 · answered Mar 19 '20 at 21:41

import json
from collections import namedtuple

#This is your "s"  -- data = s
data = '{"name": "John Smith", "hometown": {"name": "New York", "id": 123}}'

# Parse JSON into an object with attributes corresponding to dict keys.
x = json.loads(data, object_hook=lambda d: namedtuple('X', d.keys())(*d.values()))
print x.name, x.hometown.name, x.hometown.id

This answer from: How to convert JSON data into a Python object loads Json into an object. Now access it via the key it was passed with in json.

print x.content

Of course you'll have to wiggle the code around to get it to work exactly how you want. I'm not really a python expert and have nothing to test with. But the idea is to just load it into a Tuple object and access it via the key.

score 0 · Answer 2 · answered Mar 19 '20 at 21:48

import urllib, json
import urllib.request
with urllib.request.urlopen("http://staging.test.com/api/reports/68.json?auth_token=test") as url:
    s = url.read()

# assuming here you got that json content
s='{"id":68,"name":"Carrier Rates","state":"complete","user_id":166,"data_set_id":7,"bounding_date":{"id":101,"start_date":"2019-01-01T00:00:00.000-05:00","end_date":"2999-12-31T00:00:00.000-05:00","bounding_field_id":322,"related_id":68,"related_type":"Reports::Report"},"results":[{"id":68,"created_at":"2019-07-26T15:29:40.872-04:00","version_name":"07/26/2019 03:29PM","content":"https://test-staging.s3.amazonaws.com/reports/manufacturer/carrier-test.1dec2e6d-0c36-44b7-ab26-fd43fe710daf.csv"},{"id":67,"created_at":"2019-07-26T15:29:07.112-04:00","version_name":"07/26/2019 03:29PM","content":"https://test-staging.s3.amazonaws.com/reports/manufacturer/carrier-test.3b02195e-c0a2-4abe-88f7-27d20ac76e07.csv"},{"id":35,"created_at":"2019-06-26T11:01:26.900-04:00","version_name":"06/26/2019 11:01AM","content":"https://test-staging.s3.amazonaws.com/reports/manufacturer/carrier-test.a488c58d-5e04-4c28-a429-7167e9e8edaa.csv"},{"id":34,"created_at":"2019-06-26T10:57:51.396-04:00","version_name":"06/26/2019 10:57AM","content":"https://cloudtestlogistics-staging.s3.amazonaws.com/reports/manufacturer/carrier-test.bf73db19-5604-4a1d-bc31-da6cf25742cc.csv"}]}'

d=json.loads(s)

for f in d['results']:
    # manage download here
    csv_url= f['content']

score 0 · Accepted Answer · answered Mar 19 '20 at 22:09

0

The following code can help you.

    import json
    import urllib.request

    with urllib.request.urlopen("http://staging.test.com/api/reports/68.json?auth_token=test") as url:
    s = url.read()
    loadJson = json.load(s)
    results = loadJson["results"]
    csvLinks = []
    for object in results:
        csvlinks.append(object["content"])

Now you have a list of links to CSV files. Download them using urllib.

answered Mar 19 '20 at 22:09

Muhammad Danial Khan

421
8
14

this looks like it can work however I get this error. Not sure if missing library or something else: AttributeError: 'bytes' object has no attribute 'read' – user2610930 Mar 19 '20 at 22:24
Take a look at this link https://stackoverflow.com/questions/6541767/python-urllib-error-attributeerror-bytes-object-has-no-attribute-read/6541850 – Muhammad Danial Khan Mar 20 '20 at 13:52
Thanks that worked, adding this jsonResponse = json.loads(response.decode('utf-8')) – user2610930 Mar 20 '20 at 15:07

How to parse JSON from URL and download CSV files?

3 Answers3