I have a 47gb JSON file which has the following form
{
"data": {
"Key_A": {small JSON ~50mb},
"Key_B": {small JSON ~50mb},
"Key_C": {large JSON ~47gb}
}
}
The exact structure and content of file.data.Key_C
is unknown and I'd like to analyze it (e.g. how many key:val
, get the size of the items, etc).
My idea was reading line by line unfortunately I am unable to: The following script was not able to terminate (I had to abort when monitoring the session memory usage).
f = open(file_name, 'rb')
for line in f:
if 'Key_C' in file_name.decode('cp1252'): # Windows encoding
break
f.close()
If the JSON is unindented (I do not know if that is the case) reading it line by line might not be feasible. Streaming char-by-char might be a solution. I am pretty much stuck here.