Output AWS Personalize JSON takes time too long to parse

Question

I have a JSON file that has over 900k rows. I want to parsing it and dump it to dataframe. The JSON looks like this

{"input":{"userId":"user_1"},"output":{"recommendedItems":["item_1","item_2","item_3","item_4","item_5"],"scores":[0.0333953,0.0321211,0.0156664,0.0130226,0.0113141]},"error":null}
{"input":{"userId":"user_2"},"output":{"recommendedItems":["item_1","item_2","item_3","item_4","item_5"],"scores":[0.033348,0.0256025,0.0130969,0.0112574,0.0098816,]},"error":null}

It really looked like that, without comma and the end of the line. I already can parse it with loop but its takes time too long, like around +3hours. My code looks like this:

with open("try.json") as json_file:
    
    for line in (json_file):
        j = json.loads(line)
        
        user_id = j['input']['userId']
        product_id = j['output']['recommendedItems']
        score = j['output']['scores']
        
        products_rec = pd.DataFrame(j['output'])
        products_rec['userId'] = j['input']['userId']
        
        all_result = pd.concat([all_result, products_rec], axis = 0)

Is there any solution to make it faster?

Pandas supports reading json lines. https://stackoverflow.com/a/20038973/13815548 — James J, Feb 11 '23 at 22:34
@JamesJ thanks bud, with a little help from json_normalize() already solved it and make it faster from 3 hours with loop, now just for ~5 seconds form — Krisna Gita, Feb 15 '23 at 05:11

Output AWS Personalize JSON takes time too long to parse

0 Answers0