-1

While this code reads and writes a jsonlines file. How to compress it? I tried directly using gzip.open but I am getting various errors.

import json
    
def dump_jsonl(data, output_path, append=False):
    """
    Write list of objects to a JSON lines file.
    """
    mode = 'a+' if append else 'w'
    with open(output_path, mode, encoding='utf-8') as f:
        for line in data:
            json_record = json.dumps(line, ensure_ascii=False)
            f.write(json_record + '\n')
    print('Wrote {} records to {}'.format(len(data), output_path))

def load_jsonl(input_path) -> list:
    """
    Read list of objects from a JSON lines file.
    """
    data = []
    with open(input_path, 'r', encoding='utf-8') as f:
        for line in f:
            data.append(json.loads(line.rstrip('\n|\r')))
    print('Loaded {} records from {}'.format(len(data), input_path))
    return data

This is what I am doing to compress but I am unable to read it.

def dump_jsonl(data, output_path, append=False):
    with gzip.open(output_path, "a+") as f:
        for line in data:
            json_record = json.dumps(line, ensure_ascii = False)
            encoded = json_record.encode("utf-8") + ("\n").encode("utf-8")
            compressed = gzip.compress(encoded)
            f.write(compressed)
jps
  • 20,041
  • 15
  • 75
  • 79
  • There are a lot of build in libraries in Python that can help compress files. https://docs.python.org/3/library/archiving.html - What are the errors you are getting with gzip and what code did you use? – Cow Aug 07 '21 at 11:31

1 Answers1

4

Use the gzip module's compress function.

import gzip
with open('file.jsonl') as f_in:
    with gzip.open('file.jsonl.gz', 'wb') as f_out:
        f_out.writelines(f_in)

gzip.open() is for opening gzipped files, not jsonl.

Read:

gzip a file in Python

Python support for Gzip

Datajack
  • 88
  • 3
  • 16
  • 1
    FYI: I would highly recommend using the `with` statement to open files. Then you do not need to think about closing the files again. 1 up from me anyways. – Cow Aug 07 '21 at 11:43