2

I have a big JSONS file (4,5 GB) that I cannot open with Python all at once. The file consist of a few million lines which are all in JSON format. Each line is a seperate JSON file in brackets so the format of the file is:

{JSON}
{JSON}
{JSON}
...

I would like to be able to read the file line by line or a like the 200 first lines all at once but I can't figure out how to do this. Would it be possible to read the file line by line and then put the desired parts of the individual JSONs in a dataframe? Or would the dataframe be too big to handle too?

Thanks in advance!

D haverkamp
  • 41
  • 1
  • 7
  • See [my answer here](https://stackoverflow.com/questions/46256301/create-valid-json-object-in-python/46256388#46256388) for reading jsonlines. You can do something similar for 200 lines at a time. – roganjosh Sep 21 '18 at 08:18
  • "cannot open with Python all at once", what makes you write that, what have you tried? – MTTI Sep 21 '18 at 08:18
  • Well my computer cannot handle such big files.. – D haverkamp Sep 21 '18 at 08:20
  • 1
    If you would like people to help you, it would be good if you could provide as detailed information as possible, on what commands/modules you have already tried. Simply saying "Well my computer cannot handle such big files.."/"cannot open with Python all at once" doesn't shed any light on the issue and if you read the file line-by-line, I am almost certain that your computer can handle the file. – MTTI Sep 21 '18 at 08:24
  • Does this answer your question? [multiple Json objects in one file extract by python](https://stackoverflow.com/questions/27907633/multiple-json-objects-in-one-file-extract-by-python) – TAbdiukov Dec 11 '19 at 14:39

2 Answers2

1

You can read one line from file using file.readline() method.

Desired parts of JSON could be stored in memory, but make sure, to limit the size and then flush data to another file or database

Yuriy
  • 11
  • 1
0

If you open a file with the usual python way of reading files, you do read it line by line. So you can do this

with open('big.json', 'r') as f:
    for line in f:
        #select stuff
Bernhard
  • 1,253
  • 8
  • 18