From gzip to json to dataframe to csv

Question

I am trying to get some data from an open API:

https://data.brreg.no/enhetsregisteret/api/enheter/lastned

but I am having difficulties understanding the different type of objects and the order the conversions should be in. Is it strings to bytes, is it BytesIO or StringIO, is it decode('utf-8) or decode('unicode) etc..?

So far:

url_get = 'https://data.brreg.no/enhetsregisteret/api/enheter/lastned'


with urllib.request.urlopen(url_get) as response:
    encoding = response.info().get_param('charset', 'utf8')
    compressed_file = io.BytesIO(response.read())
    decompressed_file = gzip.GzipFile(fileobj=compressed_file)

and now is where I am stuck, how should I write the next line of code?

json_str = json.loads(decompressed_file.read().decode('utf-8'))

My workaround is if I write it as a json file then read it in again and do the transformation to df then it works:

with io.open('brreg.json', 'wb') as f:
    f.write(decompressed_file.read())

with open(f_path, encoding='utf-8') as fin:
    d = json.load(fin)

df = json_normalize(d)

with open('brreg_2.csv', 'w', encoding='utf-8', newline='') as fout:
    fout.write(df.to_csv())

I found many SO posts about it, but I am still so confused. This first one explains it quite good, but I still need some spoon feeding.

Python 3, read/write compressed json objects from/to gzip file

TypeError when trying to convert Python 2.7 code to Python 3.4 code

How can I create a GzipFile instance from the “file-like object” that urllib.urlopen() returns?

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

What issues do you get with your code as it stands? It looks to me like you're missing a .decompress() on the GZipFile object — Simon Notley, Dec 18 '19 at 12:05
I get AttributeError: 'bytes' object has no attribute 'encode' — Jon, Dec 18 '19 at 12:29
Sorry, this is the traceback I get: AttributeError: 'str' object has no attribute 'read' — Jon, Dec 18 '19 at 13:12

Simon Notley · Accepted Answer · 2019-12-18T15:27:27.260

It works fine for me using the decompress function rather than the GZipFile class to decompress the file, but not sure why yet...

import urllib.request
import gzip
import io
import json

url_get = 'https://data.brreg.no/enhetsregisteret/api/enheter/lastned'


with urllib.request.urlopen(url_get) as response:
    encoding = response.info().get_param('charset', 'utf8')
    compressed_file = io.BytesIO(response.read())
    decompressed_file = gzip.decompress(compressed_file.read())
    json_str = json.loads(decompressed_file.decode('utf-8'))

EDIT, in fact the following also works fine for me which appears to be your exact code... (Further edit, turns out it's not quite your exact code because your final line was outside the with block which meant response was no longer open when it was needed - see comment thread)

import urllib.request
import gzip
import io
import json

url_get = 'https://data.brreg.no/enhetsregisteret/api/enheter/lastned'


with urllib.request.urlopen(url_get) as response:
    encoding = response.info().get_param('charset', 'utf8')
    compressed_file = io.BytesIO(response.read())
    decompressed_file = gzip.GzipFile(fileobj=compressed_file)
    json_str = json.loads(decompressed_file.read().decode('utf-8'))

Thanks, it works as expected now! I've learned that life is too short to ponder why things work, since there are plenty of things that do not ;) — Jon, Dec 18 '19 at 15:16
Ah, so in my code the last line was outside the with statement, but I don't see why that should have been a problem. Anyway, ref the above quote. — Jon, Dec 18 '19 at 15:22
That sounds plausible as the cause of the issue. Certainly GzipFile just acts as a 'wrapper' for the underlying file object and I can imagine that BytesIO does the same. In that case all these lines depend upon 'response' still being open. — Simon Notley, Dec 18 '19 at 15:25

From gzip to json to dataframe to csv

1 Answers1