I have a 7z archive containing plenty of JSONs. What's the most effective (=fastest) way to iterate through the archive and read every single JSON?
My aim is to extract a certain key from every single JSON in the archive, preferably without extracting and saving the uncompressed files to disk. The result should be a pickled pandas dataframe.
This question probably points in the right direction by using py7zr. I found the readall()-method but something seems to go wrong.
import py7zr
import re
from py7zr import FILTER_BROTLI
filters = [{'id': FILTER_BROTLI, 'level': 9}]
with py7zr.SevenZipFile('testzip.7z', 'r', filters=filters) as zip:
for fname, bio in zip.readall().items():
print('{:s}: {:X}...'.format(name, bio.read(10)))
Only returns
UnsupportedCompressionMethodError: Unauthorized and modified Brotli data (skipable frame) found.
Seems like there are some issues with Brotli. Any clues?