Convert dictionary to pandas dataframe, removing level 2 items

Question

I have a dictionary that has the following structure:

'username1': {  'attributes': { 'AccountType': ['01'], 
                                'UnitCode': ['001'], 
                                'UnitDesc': ['Marketing'], 
                                'title': ['Assistant'], }, 
                'extra': 'CN=000000'    }
'username2': {  'attributes': { 'AccountType': ['01'], 
                                'UnitCode': ['002'], 
                                'UnitDesc': ['Resources'], 
                                'title': ['Manager'], }, 
                'extra': 'CN=000000'    }

I am trying to achieve this output in a DataFrame:

+-----------+-------------+----------+
|           | AccountType | UnitCode |
+-----------+-------------+----------+
| username1 |          01 |      001 |
| username2 |          01 |      002 |
+-----------+-------------+----------+

The answer provided on a similar question here got me closer to what I was after, but I need to drop the data in the 'extra' section.

df = pd.concat({k: pd.DataFrame(v).T for k, v in mydict.items()}, axis=0)

When I call df.index it looks like this:

MultiIndex([('username1', 'attributes'),
            ('username1',      'extra'),
            ('username2', 'attributes'),
            ('username2',      'extra')])

I've attempted to remove the content from the dataframe after conversion and also from the dictionary before converting to no success. I'm not sure how to stop the duplication.

Would it be okay to delete the data under `extra` before creating the dataframe? — PacketLoss, Feb 04 '20 at 05:09
If your data is always going to be like this, you can do `pd.DataFrame([[k]+v["attributes"]["AccountType"]+v["attributes"]["UnitCode"] for k,v in mydict.items()])`. — Henry Yik, Feb 04 '20 at 05:22

score 0 · Answer 1 · answered Feb 04 '20 at 05:22

You can drop the extra rows by resetting index and dropping them

df = df.reset_index()
df.loc[df['level_1'].ne('extra')]

     level_0     level_1 AccountType UnitCode     UnitDesc        title
0  username1  attributes        [01]    [001]  [Marketing]  [Assistant]
2  username2  attributes        [01]    [001]  [Marketing]  [Assistant]

score 0 · Answer 2 · answered Feb 04 '20 at 05:34

If you can enlist your dictionaries

x = [{'username1': {  'attributes': { 'AccountType': ['01'], 
                            'UnitCode': ['001'], 
                            'UnitDesc': ['Marketing'], 
                            'title': ['Assistant'], }, 
            'extra': 'CN=000000'    }},
 {'username2': {  'attributes': { 'AccountType': ['01'], 
                            'UnitCode': ['002'], 
                            'UnitDesc': ['Resources'], 
                            'title': ['Manager'], }, 
            'extra': 'CN=000000'    }},
{'username3': {  'attributes': { 'AccountType': ['01'], 
                            'UnitCode': ['003'], 
                            'UnitDesc': ['Marketing_3'], 
                            'title': ['Assistant_3'], }, 
            'extra': 'CN=000000'    }},
 {'username4': {  'attributes': { 'AccountType': ['01'], 
                            'UnitCode': ['004'], 
                            'UnitDesc': ['Resources_4'], 
                            'title': ['Manager_4'], }, 
            'extra': 'CN=000000'    }}]


df = pd.DataFrame(x[0]['username1']['attributes'])
for i in range(1, len(x)):
    df = pd.concat([df, pd.DataFrame(x[i][list(x[i].keys())[0]]['attributes'])])
df.index = [list(x[i].keys())[0] for i in range(len(x))]
df.drop(['UnitDesc', 'title'], axis=1, inplace=True)



         AccountType UnitCode
username1          01      001
username2          01      002
username3          01      003
username4          01      004

score 0 · Answer 3 · answered Feb 04 '20 at 06:41

...another option

# Create a list containing all dictionaries
dict_list = (username1,username2)

# Create required column names for dataframe
attribute_list = ('AccountType','UnitCode')

# Create required indexing for dataframe
username_list = ('username1','username2')

# Create empty dataframe
df = pd.DataFrame(columns = {'AccountType' ,'UnitCode'}, index = username_list)

# Fill dataframe
for num,username in enumerate(username_list):
  for attribute in attribute_list:
    df[attribute][username] = str(dict_list[num]['attributes'][attribute])

# Clean data
for col in list(df.columns):
    df[col] = df[col].map(lambda x: str(x)[:-2])
    df[col] = df[col].map(lambda x: str(x)[2:])

Convert dictionary to pandas dataframe, removing level 2 items

3 Answers3