I have a JSON file that has over 900k rows. I want to parsing it and dump it to dataframe. The JSON looks like this
{"input":{"userId":"user_1"},"output":{"recommendedItems":["item_1","item_2","item_3","item_4","item_5"],"scores":[0.0333953,0.0321211,0.0156664,0.0130226,0.0113141]},"error":null}
{"input":{"userId":"user_2"},"output":{"recommendedItems":["item_1","item_2","item_3","item_4","item_5"],"scores":[0.033348,0.0256025,0.0130969,0.0112574,0.0098816,]},"error":null}
It really looked like that, without comma and the end of the line. I already can parse it with loop but its takes time too long, like around +3hours. My code looks like this:
with open("try.json") as json_file:
for line in (json_file):
j = json.loads(line)
user_id = j['input']['userId']
product_id = j['output']['recommendedItems']
score = j['output']['scores']
products_rec = pd.DataFrame(j['output'])
products_rec['userId'] = j['input']['userId']
all_result = pd.concat([all_result, products_rec], axis = 0)
Is there any solution to make it faster?