0

I have the following data structure

+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
 EMP_ID     ¦ ATTR1_OLD_VAL ¦                                                                                        ATTR1_NEW_VAL ¦ ATTR2_OLD_VAL ¦ ATTR2_NEW_VAL ¦ ATTR3_OLD_VAL ¦ ATTR3_NEW_VAL   ¦
¦-----------+------------+---------------+---------------+---------------+---------------+---------------+-------------------------------------------------------------------------------------------¦
¦      E001 ¦ xyz           ¦ [{"codeTs":"12345567 ","goodsAttrName":"test1"},{"codeTs":"6402910000","goodsAttrName":"test2"}]      ¦ mmm           ¦ nnn           ¦ zzz           ¦ aaa            ¦
       E002 ¦ 1234          ¦                                                                                                       ¦ 123           ¦ jjj           ¦ iii           ¦ bb             ¦
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

I want to process it into the following data

+---------------------------------------------------------------------------------------------------------------------------------------------+
 EMP_ID     ¦ ATTR1_OLD_VAL ¦ codeTs     ¦  goodsAttrName  ¦ATTR1_NEW_VAL ¦ ATTR2_OLD_VAL ¦ ATTR2_NEW_VAL ¦ ATTR3_OLD_VAL    ¦ ATTR3_NEW_VAL  ¦
¦-----------+------------+---------------+---------------+---------------+---------------+---------------+------------------------------------¦
¦      E001 ¦ xyz           ¦ 12345567   ¦  test1          ¦               ¦   mmm           ¦ nnn           ¦ zzz           ¦ aaa            ¦
¦      E001 ¦ xyz           ¦ 6402910000 ¦  test2          ¦               ¦   mmm           ¦ nnn           ¦ zzz           ¦ aaa            ¦
¦      E002 ¦ 1234          ¦            ¦                 ¦               ¦   123           ¦  jjj          ¦ iii           ¦ bb             ¦                                                                      
+---------------------------------------------------------------------------------------------------------------------------------------------+

I tried the following code didn't work

import pymongo as pm
import pandas as pd
import numpy as np
import datetime as dt
from bson import json_util
from pandas.io.json import json_normalize
import json

client = pm.MongoClient('mongodb://user1:user1@127.0.0.1:27017')
db = client['my_db']
mongo_data = list(db['ATTR1_NEW_VAL'].find({}))
sanitized = json.loads(json_util.dumps(mongo_data))

normalized = json_normalize(sanitized)
df = pd.DataFrame(normalized)
print(df)
hakukou
  • 111
  • 1
  • 10
  • `json.loads(json_util.dumps(mongo_data))` is _really_ wonky. Why would you do that? – roganjosh Dec 07 '19 at 13:29
  • Check this [link](https://stackoverflow.com/questions/38231591/splitting-dictionary-list-inside-a-pandas-column-into-separate-columns) – davidbilla Dec 07 '19 at 13:33

1 Answers1

0

Try this:

Explode the list of dicts into to rows using df.explode first and then convert the dict values to columns by apply(pd.Series)

df = df.explode('ATTR1_NEW_VAL')
print(pd.concat([df.drop(['ATTR1_NEW_VAL'], axis=1), df['ATTR1_NEW_VAL'].apply(pd.Series)], axis=1))
davidbilla
  • 2,120
  • 1
  • 15
  • 26