2

I have a 7z archive containing plenty of JSONs. What's the most effective (=fastest) way to iterate through the archive and read every single JSON?

My aim is to extract a certain key from every single JSON in the archive, preferably without extracting and saving the uncompressed files to disk. The result should be a pickled pandas dataframe.


This question probably points in the right direction by using py7zr. I found the readall()-method but something seems to go wrong.

import py7zr
import re
from py7zr import FILTER_BROTLI

filters = [{'id': FILTER_BROTLI, 'level': 9}]

with py7zr.SevenZipFile('testzip.7z', 'r', filters=filters) as zip:
    for fname, bio in zip.readall().items():
        print('{:s}: {:X}...'.format(name, bio.read(10)))

Only returns

UnsupportedCompressionMethodError: Unauthorized and modified Brotli data (skipable frame) found.

Seems like there are some issues with Brotli. Any clues?

do-me
  • 1,600
  • 1
  • 10
  • 16
  • I created the file through 7zip GUI (Brotli as compression, level 9). The [docs](https://py7zr.readthedocs.io/en/stable/api.html#compression-methods) claim that Brotli is supported however. – do-me Aug 05 '21 at 15:54

0 Answers0