How to combine a string of multiple json gz files in a list into one json gz file then open the file?

Question

I have a list of json gz https files

FYI: these files are not real files due to privacy laws but mimic the exact structure.

list_of_files = ['https://premera.saph.com/202011/json.gz', 'https://premera.saph.com/202011/json.gz']

My goal is to combine all these json gz files into one large json gz file.

I've tried numerous ways to do this by referencing other Stack Overflow questions; however, I am unable to find exactly what I am looking for.

This comment helped me somewhat, but in my situation, I believe that I need to add requests to get the file since it is an http.

Python 3, read/write compressed json objects from/to gzip file

import requests
import gzip

one_file = file[0]

with open(one_file, 'rb') as f:
     serial = gzip.decompress(f.read())

Error:

OSError: [Errno 22] Invalid argument: 'https://premera.saph.com/202011/json.gz'

Got this error on the correct https since this is changed for privacy.

can you provide a cleaner code ? here list_of_files is not used and file doesn't exist — Bastien B, Jul 07 '22 at 13:07
You'll need to download the file first before you can decompress it. — Sören, Jul 07 '22 at 13:12

score 0 · Answer 1 · answered Jul 07 '22 at 13:13

Assuming list_of_files = file

You are string to decompress a string, what you need to do is download the content of the url and after that you will be able to decompress it.

import requests

list_of_file_content = []
list_of_files = ['https://premera.saph.com/202011/json.gz', 'https://premera.saph.com/202011/json.gz']

for file in list_of_files:
    r = requests.get(file)
    list_of_file_content.append(r.content)

score 0 · Accepted Answer · answered Jul 07 '22 at 13:17

This comment helped me somewhat, but in my situation, I believe that I need to add requests to get the file since it is an http.

Indeed built-in open function does not support HTTP access, however in this case I would use urllib.request.urlopen, consider following example using example file provided by Mozilla

import json
import gzip
import urllib.request
url = "https://wiki.mozilla.org/images/f/ff/Example.json.gz"
with urllib.request.urlopen(url) as gzf:
    with gzip.open(gzf) as jsonf:
        data = json.load(jsonf)
        print(data)

gives output

{'InstallTime': '1295768962', 'Comments': 'Will test without extension.', 'Theme': 'classic/1.0', 'Version': '4.0b10pre', 'id': 'ec8030f7-c20a-464f-9b0e-13a3a9e97384', 'Vendor': 'Mozilla', 'EMCheckCompatibility': 'false', 'Throttleable': '1', 'Email': 'deinspanjer@mozilla.com', 'URL': 'http://nighthacks.com/roller/jag/entry/the_shit_finally_hits_the', 'version': '4.0b10pre', 'CrashTime': '1295903735', 'ReleaseChannel': 'nightly', 'submitted_timestamp': '2011-01-24T13:15:48.550858', 'buildid': '20110121153230', 'timestamp': 1295903748.551002, 'Notes': 'Renderers: 0x22600,0x22600,0x20400', 'StartupTime': '1295768964', 'FramePoisonSize': '4096', 'FramePoisonBase': '7ffffffff0dea000', 'AdapterRendererIDs': '0x22600,0x22600,0x20400', 'Add-ons': 'compatibility@addons.mozilla.org:0.7,enter.selects@agadak.net:6,{d10d0bf8-f5b5-c8b4-a8b2-2b9879e08c5d}:1.3.3,sts-ui@sidstamm.com:0.1,masspasswordreset@johnathan.nightingale:1.04,support@lastpass.com:1.72.0,{972ce4c6-7e08-4474-a285-3208198ce6fd}:4.0b10pre', 'BuildID': '20110121153230', 'SecondsSinceLastCrash': '810473', 'ProductName': 'Firefox', 'legacy_processing': 0}

Explanation: first with does open file under specified URL then gzip.open is used to decompress is, so json.load can be used to parse JSON and get data (data is dict). Note that all used imports pertain to standard library, so you do not need to install any external package.

and if you wanted to iterate through the list of files: Would you do this? ```For i in url: with urllib.request.urlopen(url) as gzf: with gzip.open(gzf) as jsonf: data = json.load(jsonf) print(data) ``` — Zane, Jul 07 '22 at 13:41
@Zane in such case encase that part in `for url in urls:` so `url` pertains to element of list and `urls` is `list` of `str`s — Daweo, Jul 07 '22 at 14:02

How to combine a string of multiple json gz files in a list into one json gz file then open the file?

2 Answers2