0

Starting with json formatted like:

training = [{'category': ['Monetary'],
  'id': 82321187,
  'idx': 9839,
  'subcategory': ['stop loss'],
  'target_num': ['7.38']},
 {'category': ['Temporal',
   'Product Number',
   'Product Number',
   'Product Number',
   'Monetary'],
  'id': 71568007,
  'idx': 5202,
  'subcategory': ['date',
   'Product Number',
   'Product Number',
   'Product Number',
   'money'],
  'target_num': ['2017', '343', '60', '080', '60']}]

I want the generated dataframe to look like:

category            id        idx   subcategory       target_num
'Monetary'          82321187  9839  'stop loss'       '7.38'
'Temporal',         71568007  5202  'date'            '2017'
'Product Number',   71568007  5202  'Product Number'  '343'
'Product Number',   71568007  5202  'Product Number'  '60'
'Product Number',   71568007  5202  'Product Number'  '080'
'Monetary'          71568007  5202  'money'           '60'

i tried json_normalize:

from pandas.io.json import json_normalize
print (json_normalize(training, meta=['idx']))

the result is:

                                             category        id   idx  \
 0                                         [Monetary]  82321187  9839   
 1  [Temporal, Product Number, Product Number, Pro...  71568007  5202   

                                          subcategory                target_num  
 0                                        [stop loss]                    [7.38]  
 1  [date, Product Number, Product Number, Product...  [2017, 343, 60, 080, 60] 

the attributes with list values are kept as lists instead of being broke into separate rows.

EDIT:

solution from referenced question

def chris2(df):
    category_vals = df.category.values.tolist()
    rs = [len(r) for r in category_vals]
    id_vals = np.repeat(df.id.values, rs)
    idx_vals = np.repeat(df.idx.values, rs)
    subcategory_vals = df.subcategory.values.tolist()
    target_num_vals = df.target_num.values.tolist()
    return pd.DataFrame(np.column_stack((np.concatenate(category_vals), 
                                         id_vals,
                                         idx_vals,
                                         np.concatenate(subcategory_vals),
                                         np.concatenate(target_num_vals),
                                        )), columns=df.columns)
chris2(df)
lapolonio
  • 1,107
  • 2
  • 14
  • 24
  • Check the link https://stackoverflow.com/questions/53218931/how-do-i-unnest-a-column-in-a-pandas-dataframe – BENY Nov 16 '18 at 16:45

1 Answers1

1

Unnesting

df=pd.DataFrame(d)
unnesting(df,['category','subcategory','target_num'])
Out[48]: 
         category     subcategory target_num        id   idx
0        Monetary       stop loss       7.38  82321187  9839
1        Temporal            date       2017  71568007  5202
1  Product Number  Product Number        343  71568007  5202
1  Product Number  Product Number         60  71568007  5202
1  Product Number  Product Number        080  71568007  5202
1        Monetary           money         60  71568007  5202
BENY
  • 317,841
  • 20
  • 164
  • 234