Starting with json formatted like:
training = [{'category': ['Monetary'],
'id': 82321187,
'idx': 9839,
'subcategory': ['stop loss'],
'target_num': ['7.38']},
{'category': ['Temporal',
'Product Number',
'Product Number',
'Product Number',
'Monetary'],
'id': 71568007,
'idx': 5202,
'subcategory': ['date',
'Product Number',
'Product Number',
'Product Number',
'money'],
'target_num': ['2017', '343', '60', '080', '60']}]
I want the generated dataframe to look like:
category id idx subcategory target_num
'Monetary' 82321187 9839 'stop loss' '7.38'
'Temporal', 71568007 5202 'date' '2017'
'Product Number', 71568007 5202 'Product Number' '343'
'Product Number', 71568007 5202 'Product Number' '60'
'Product Number', 71568007 5202 'Product Number' '080'
'Monetary' 71568007 5202 'money' '60'
i tried json_normalize:
from pandas.io.json import json_normalize
print (json_normalize(training, meta=['idx']))
the result is:
category id idx \
0 [Monetary] 82321187 9839
1 [Temporal, Product Number, Product Number, Pro... 71568007 5202
subcategory target_num
0 [stop loss] [7.38]
1 [date, Product Number, Product Number, Product... [2017, 343, 60, 080, 60]
the attributes with list values are kept as lists instead of being broke into separate rows.
EDIT:
solution from referenced question
def chris2(df):
category_vals = df.category.values.tolist()
rs = [len(r) for r in category_vals]
id_vals = np.repeat(df.id.values, rs)
idx_vals = np.repeat(df.idx.values, rs)
subcategory_vals = df.subcategory.values.tolist()
target_num_vals = df.target_num.values.tolist()
return pd.DataFrame(np.column_stack((np.concatenate(category_vals),
id_vals,
idx_vals,
np.concatenate(subcategory_vals),
np.concatenate(target_num_vals),
)), columns=df.columns)
chris2(df)