I am trying to get some data from an open API:
https://data.brreg.no/enhetsregisteret/api/enheter/lastned
but I am having difficulties understanding the different type of objects and the order the conversions should be in. Is it strings
to bytes
, is it BytesIO
or StringIO
, is it decode('utf-8)
or decode('unicode)
etc..?
So far:
url_get = 'https://data.brreg.no/enhetsregisteret/api/enheter/lastned'
with urllib.request.urlopen(url_get) as response:
encoding = response.info().get_param('charset', 'utf8')
compressed_file = io.BytesIO(response.read())
decompressed_file = gzip.GzipFile(fileobj=compressed_file)
and now is where I am stuck, how should I write the next line of code?
json_str = json.loads(decompressed_file.read().decode('utf-8'))
My workaround is if I write it as a json file then read it in again and do the transformation to df then it works:
with io.open('brreg.json', 'wb') as f:
f.write(decompressed_file.read())
with open(f_path, encoding='utf-8') as fin:
d = json.load(fin)
df = json_normalize(d)
with open('brreg_2.csv', 'w', encoding='utf-8', newline='') as fout:
fout.write(df.to_csv())
I found many SO posts about it, but I am still so confused. This first one explains it quite good, but I still need some spoon feeding.
Python 3, read/write compressed json objects from/to gzip file
TypeError when trying to convert Python 2.7 code to Python 3.4 code
How can I create a GzipFile instance from the “file-like object” that urllib.urlopen() returns?