Get nested JSON into Pandas dataframe and CSV

Question

I'm having problems getting the data I need out of this JSON response and into their own columns. The code below produces a df and CSV, I cleaned it up a bit to isolate the data I'm after. In the CSV image below the data I want is in the third column[trade.stats].

On each row in that column there are lists with 6 items in them. I need those to be added to their own columns, in addition some rows have multiple lists in that column I would need that as well. I'm confused on how this would work.

Somehow I need to get it so that if a row has multiple lists in that column (like row 4 in the image), those six items are added to their own row. The end product should be a df and CSV with rows of of all the data extracted from the third column.

import json, time
from websocket import create_connection
import pandas as pd
   
# start with empty dataframe
df = pd.DataFrame()   

ws = create_connection("wss://ws.kraken.com/")

ws.send(json.dumps({
    "event": "subscribe",
    "pair": ["BTC/USD"],
    "subscription": {"name": "trade"}
}))

timeout = time.time() + 60*1
while time.time() < timeout:
    js = json.loads(ws.recv())
    if isinstance(js, dict):
        df = pd.concat([df, pd.json_normalize(js)])
        #df = df.append([df, pd.json_normalize(js)])
    elif isinstance(js, list):
        #df = df.append([df, pd.json_normalize({"event":"trade",
        df = pd.concat([df, pd.json_normalize({"event":"trade",
                                               "trade":{                                                                                                     
                                                   "s0":js[1][0][0],
                                                   "s1":js[1][0][1], 
                                                   "s2":js[1][0][2],
                                                   "s3":js[1][0][3],                                                
                                                   "s4":js[1][0][4],
                                                   "s5":js[1][0][5],  
                                                   "pair":js[3]}
                                              })
                       ])
    
    else:
         f"unknown socket data {js}"
   #data filters
    df = df[df['event'] != 'systemStatus'] 
    df = df[df['event'] != 'subscriptionStatus']
    df = df[df['event'] != 'heartbeat'] 
    
    
    print(df)
    #time.sleep(1)

#column drop for csv
cols = [0,2,3,4,5,6,7] 
df.drop(df.columns[cols],axis=1,inplace=True)
df.columns =['event','price','volume', 'time', 'side', 'orderType', 'misc', 'pair']
csv_file = "kraken-test4.csv"
df.to_csv(csv_file, index=False, encoding='utf-8')  
ws.close()

CSV data:

Here is some of the JSON response, from the beginning:

{'connectionID': 18155540471802097628, 'event': 'systemStatus', 'status': 'online', 'version': '1.7.2'}
{'channelID': 321, 'channelName': 'trade', 'event': 'subscriptionStatus', 'pair': 'XBT/USD', 'status': 'subscribed', 'subscription': {'name': 'trade'}}
{'event': 'heartbeat'}
{'event': 'heartbeat'}
{'event': 'heartbeat'}
{'event': 'heartbeat'}
{'event': 'heartbeat'}
{'event': 'heartbeat'}
{'event': 'heartbeat'}
{'event': 'heartbeat'}
{'event': 'heartbeat'}
[321, [['43081.00000', '0.02836870', '1612805543.405350', 'b', 'm', ''], ['43087.20000', '0.12163130', '1612805543.407889', 'b', 'm', '']], 'trade', 'XBT/USD']
{'event': 'heartbeat'}
[321, [['43087.00000', '0.50000000', '1612805544.764522', 'b', 'm', '']], 'trade', 'XBT/USD']
[321, [['43087.20000', '0.00230000', '1612805545.350573', 'b', 'm', '']], 'trade', 'XBT/USD']
{'event': 'heartbeat'}
{'event': 'heartbeat'}
{'event': 'heartbeat'}
[321, [['43087.20000', '0.06226156', '1612805547.390753', 'b', 'l', '']], 'trade', 'XBT/USD']
[321, [['43081.60000', '0.00300000', '1612805548.090721', 's', 'l', '']], 'trade', 'XBT/USD']

Also, avoid calling `pd.concat` or `DataFrame.append` in a `for-loop` since it leads to [quadratic copying](https://stackoverflow.com/a/36489724/1422451). — Parfait, Feb 08 '21 at 17:22
how would I add to the dataframe then on each loop? I will post that, but I did just revise my code it now is closer to what I want, the only thing now is the mutiple lists issue I mentioned.. Please look at revised code. — robothead, Feb 08 '21 at 17:29
your code seems to have gone in a strange direction.... why concat to then filter? why not navigate to what you want through list slicing? — Rob Raymond, Feb 08 '21 at 17:45
learning on the fly, Im trying whatever I can find to try to get this to work, also that was more or less to make the csv clearer so I could figure out the next part, it wasnt meant to be the final code, I would very much like to learn how to do it properly, this was just effective for the time being. BTW, Thanks for all your help, Im a bit out of my element here. — robothead, Feb 08 '21 at 17:46

Get nested JSON into Pandas dataframe and CSV

0 Answers0