0

I have a 47gb JSON file which has the following form

{
  "data": {
    "Key_A": {small JSON ~50mb},
    "Key_B": {small JSON ~50mb},
    "Key_C": {large JSON ~47gb}
  }
}

The exact structure and content of file.data.Key_C is unknown and I'd like to analyze it (e.g. how many key:val, get the size of the items, etc).

My idea was reading line by line unfortunately I am unable to: The following script was not able to terminate (I had to abort when monitoring the session memory usage).

f = open(file_name, 'rb')
for line in f:
    if 'Key_C' in file_name.decode('cp1252'):  # Windows encoding
        break

f.close()

If the JSON is unindented (I do not know if that is the case) reading it line by line might not be feasible. Streaming char-by-char might be a solution. I am pretty much stuck here.

user101
  • 476
  • 1
  • 4
  • 9

0 Answers0