-1

I have following dict which I want to convert into pandas. this dict have nested list which can appear for one node but not other.

dis={"companies": [{"object_id": 123,
                           "name": "Abd ",
                           "contact_name": ["xxxx",
                                                                       "yyyy"],
                           "contact_id":[1234,
                                                                     33455]
                           },
                          {"object_id": 654,
                           "name": "DDSPP"},
                          {"object_id": 987,
                           "name": "CCD"}
                          ]}

AS

object_id, name, contact_name, contact_id
123,Abd,xxxx,1234
123,Abd,yyyy,
654,DDSPP,,
987,CCD,,

How can i achive this

I was trying to do like

abc = pd.DataFrame(dis).set_index['object_id','contact_name']

but it says

'method' object is not subscriptable

Ganesh Jadhav
  • 616
  • 2
  • 7
  • 21

2 Answers2

0

This is inspired from @jezrael answer in this link: Splitting multiple columns into rows in pandas dataframe

Use:

s = {"companies": [{"object_id": 123,
                       "name": "Abd ",
                       "contact_name": ["xxxx",
                                                                   "yyyy"],
                       "contact_id":[1234,
                                                                 33455]
                       },
                      {"object_id": 654,
                       "name": "DDSPP"},
                      {"object_id": 987,
                       "name": "CCD"}
                      ]}
df = pd.DataFrame(s) #convert into DF
df = df['companies'].apply(pd.Series) #this splits the internal keys and values into columns
split1 = df.apply(lambda x: pd.Series(x['contact_id']), axis=1).stack().reset_index(level=1, drop=True)
split2 = df.apply(lambda x: pd.Series(x['contact_name']), axis=1).stack().reset_index(level=1, drop=True)
df1 = pd.concat([split1,split2], axis=1, keys=['contact_id','contact_name'])
pd.options.display.float_format = '{:.0f}'.format
print (df.drop(['contact_id','contact_name'], axis=1).join(df1).reset_index(drop=True))

Output with regular index:

    name  object_id  contact_id contact_name
0   Abd         123        1234         xxxx
1   Abd         123       33455         yyyy
2   DDSPP       654         nan          NaN
3   CCD         987         nan          NaN

Is this something you were looking for?

anky
  • 74,114
  • 11
  • 41
  • 70
0

If you have just only one column needs to convert, then you can use something more shortly, like this:

df = pd.DataFrame(d['companies'])
d = df.loc[0].apply(pd.Series)
d[1].fillna(d[0], inplace=True)
df.drop([0],0).append(d.T)

Otherwise, if you need to do this with more then one raw, you can use it, but it have to be modified.