Optimal way to parse Twitter JSON objects from one file/multiple files into python

Question

I have the Twitter dataset (multiple JSON files), but let's start from one file. I have to parse JSON objects to python but json.loads() only parse one object. A similar question is asked here but solutions are not working or good enough.

1- I can not convert JSON objects into the list as it is not efficient and I have too much data. Also proposed solutions are based on "\n" while my Twitter data objects end like }{ there is no newline and I can not add manually. (Twitter objects are also not line by line)

2- The second solution is JSONStream and there is not much available about it on official documentation.

3- Is there any other efficient way? One I have in consideration is using MongoDB. but I never worked on MongoDB. so I don't know if this is possible with this or not.

below picture shows the length of tweet object and }{

with open('sampledata.json','r',encoding='utf8') as json_file:
    #for i in json_file:
     while(True):
        dataobj = json.load(json_file)
        print(dataobj)
print("Printing each JSON Decoded Object")

Error: As there are 287 lines for one object.

raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 287 column 2 (char 10528)

you don't need the `while` loop – bherbruck Jan 12 '21 at 13:17 — bherbruck, Jan 12 '21 at 13:17
where is the json file coming from? `}{` is invalid json... – bherbruck Jan 12 '21 at 13:21 — bherbruck, Jan 12 '21 at 13:21

score 0 · Answer 1 · answered Jan 12 '21 at 13:20

0

The while loop used while reading the json file is not needed You can use this to read a json file:

def read_json(path):
    with open(path, 'r') as file:
        return json.load(file)

my_data = read_json('sampledata.json')

answered Jan 12 '21 at 13:20

bherbruck

2,167
1
6
17

Optimal way to parse Twitter JSON objects from one file/multiple files into python

1 Answers1