0

I have the Twitter dataset (multiple JSON files), but let's start from one file. I have to parse JSON objects to python but json.loads() only parse one object. A similar question is asked here but solutions are not working or good enough.

1- I can not convert JSON objects into the list as it is not efficient and I have too much data. Also proposed solutions are based on "\n" while my Twitter data objects end like }{ there is no newline and I can not add manually. (Twitter objects are also not line by line)

2- The second solution is JSONStream and there is not much available about it on official documentation.

3- Is there any other efficient way? One I have in consideration is using MongoDB. but I never worked on MongoDB. so I don't know if this is possible with this or not.

below picture shows the length of tweet object and }{

enter image description here

with open('sampledata.json','r',encoding='utf8') as json_file:
    #for i in json_file:
     while(True):
        dataobj = json.load(json_file)
        print(dataobj)
print("Printing each JSON Decoded Object")

Error: As there are 287 lines for one object.

raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 287 column 2 (char 10528)
Adnan Ali
  • 2,851
  • 5
  • 22
  • 39

1 Answers1

0

The while loop used while reading the json file is not needed You can use this to read a json file:

def read_json(path):
    with open(path, 'r') as file:
        return json.load(file)

my_data = read_json('sampledata.json')
bherbruck
  • 2,167
  • 1
  • 6
  • 17