I am reading in a large zipped json file ~4GB. I want to read in the first n
lines.
with gzip.open('/path/to/my/data/data.json.gz','rt') as f:
line_n = f.readlines(1)
print(ast.literal_eval(line_n[0])['events']) # a dictionary object
This works fine when I want to read a single line. If now try and read in a loop e.g.
no_of_lines = 1
with gzip.open('/path/to/my/data/data.json.gz','rt') as f:
for line in range(no_of_lines):
line_n = f.readlines(line)
print(ast.literal_eval(line_n[0])['events'])
My code takes forever to execute, even if that loop is of length 1. I'm assuming this behaviour has something to do with how gzip
read files, perhaps when I loop it tries to obtain information about the file length which causes the long execution time? Can anyone shed some light on this and potentially provide an alternative way of doing this?
An edited first line of my data:
['{"events": {"category": "EVENT", "mac_address": "123456", "co_site": "HSTH"}}\n']