1

I have a column "data" which has json object as values. I would like to split them up.

source = {'_id':['SE-DATA-BB3A','SE-DATA-BB3E','SE-DATA-BB3F'],  'pi':['BB3A_CAP_BMLS','BB3E_CAP_BMLS','BB3F_CAP_PAS'], 'Datetime':['190725-122500', '190725-122500', '190725-122500'], 'data': [ {'bb3a_bmls':[{'name': 'WAG 01', 'id': '105F', 'state': 'available', 'nodes': 3,'volumes-': [{'state': 'available', 'id': '330172', 'name': 'q_-4144d4e'}, {'state': 'available', 'id': '275192', 'name': 'p_3089d821ae', }]}]}
, {'bb3b_bmls':[{'name': 'FEC 01', 'id': '382E', 'state': 'available', 'nodes': 4,'volumes': [{'state': 'unavailable', 'id': '830172', 'name': 'w_-4144d4e'}, {'state': 'unavailable', 'id': '223192', 'name': 'g_3089d821ae', }]}]}
, {'bb3c_bmls':[{'name': 'ASD 01', 'id': '303F', 'state': 'available', 'nodes': 6,'volumes': [{'state': 'unavailable', 'id': '930172', 'name': 'e_-4144d4e'}, {'state': 'unavailable', 'id': '245192', 'name': 'h_3089d821ae', }]}]}
] }

input_df = pd.DataFrame(source)

My input_df is as below:

enter image description here

I'm expecting the output_df as below:

enter image description here

I could manage to get the columns volume_id volume_name volume_state name id state nodes using the below method.

input_df['data'] = input_df['data'].apply(pd.Series) 

which will result as below enter image description here

Test_df=pd.concat([json_normalize(input_df['bb3a_bmls'][key], 'volumes', ['name','id','state','nodes'], record_prefix='volume_') for key in input_df.index if isinstance(input_df['bb3a_bmls'][key],list)]).reset_index(drop=True)

Which will result for one "SERVER" - bb3a_bmls

enter image description here

Now, I don't have an idea how to get the parent columns "_id", "pi", "Datetime" back.

Tad
  • 811
  • 8
  • 16
  • may be you want to check this out link https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.io.json.json_normalize.html – user96564 Aug 31 '19 at 19:26
  • Possible duplicate of [Unfold a nested dictionary with lists into a pandas DataFrame](https://stackoverflow.com/questions/47840093/unfold-a-nested-dictionary-with-lists-into-a-pandas-dataframe) – ndclt Aug 31 '19 at 20:14

1 Answers1

1

Idea is loop by each nested lists or by dicts and create list of dictionary for pass to DataFrame constructor:

out = []
zipped = zip(source['_id'], source['pi'], source['Datetime'], source['data'])

for a,b,c,d in zipped:
    for k1, v1 in d.items():
        for e in v1:
            #get all values of dict with exlude volumes
            di = {k2:v2 for k2, v2 in e.items() if k2 != 'volumes'}
            #for each dict in volumes add volume_ to keys
            for f in e['volumes']:
                di1 = {f'volume_{k3}':v3 for k3, v3 in f.items()}
                #create dict from previous values
                di2 = {'_id':a, 'pi':b,'Datetime':c, 'SERVER':k1}
                #add to list merged dictionaries
                out.append({**di2, ** di1, **di})

df = pd.DataFrame(out)
print (df)

            _id             pi       Datetime     SERVER volume_state  \
0  SE-DATA-BB3A  BB3A_CAP_BMLS  190725-122500  bb3a_bmls    available   
1  SE-DATA-BB3A  BB3A_CAP_BMLS  190725-122500  bb3a_bmls    available   
2  SE-DATA-BB3E  BB3E_CAP_BMLS  190725-122500  bb3b_bmls  unavailable   
3  SE-DATA-BB3E  BB3E_CAP_BMLS  190725-122500  bb3b_bmls  unavailable   
4  SE-DATA-BB3F   BB3F_CAP_PAS  190725-122500  bb3c_bmls  unavailable   
5  SE-DATA-BB3F   BB3F_CAP_PAS  190725-122500  bb3c_bmls  unavailable   

  volume_id   volume_name    name    id      state  nodes  
0    330172    q_-4144d4e  WAG 01  105F  available      3  
1    275192  p_3089d821ae  WAG 01  105F  available      3  
2    830172    w_-4144d4e  FEC 01  382E  available      4  
3    223192  g_3089d821ae  FEC 01  382E  available      4  
4    930172    e_-4144d4e  ASD 01  303F  available      6  
5    245192  h_3089d821ae  ASD 01  303F  available      6  
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thanks jezrael!! , By the way it looks like after processing 1st element of "volumes" , instead of processing 2nd element of "volumes", it is processing 1st element again thus generating duplicates. – Tad Sep 03 '19 at 18:10
  • @Tad - You are right, answer was a bit edited for correct output. – jezrael Sep 04 '19 at 05:59
  • 1
    Excellent!!.. Thanks a lot jezrael!! – Tad Sep 04 '19 at 06:28