0

I have a dataframe

df =
          name     age     character
0          A        10       fire
1          A        15       water
2          A        20       earth
3          A        25       air
4          B        10       fire
5          B        7        air

I want to convert this dataframe to dictionary, so that output will be,

dic = {'A': [[10, 15, 20, 25], ['fire', 'water', 'earth', 'air']],
       'B': [[10, 7], ['fire', 'air']] }

What I tried is,

from collections import defaultdict
dic = defaultdict(list)
for i in range(len(df)):
    dic[df.loc['name', i]].append(df.loc['age', i])
    dic[df.loc['name', i]].append(df.loc['character', i]) # this is wrong. It appends to existing list.

If I declare dic = defaultdict([[], []]), it throws error that first argument of defaultdict must be callable or None.
How can I improve this dictionary?

jayko03
  • 2,329
  • 7
  • 28
  • 51

2 Answers2

1

Here's a solution that returns np.array, which is similar enough to list:

{k: d[['age','character']].T.to_numpy() for k,d in df.groupby('name')}

Output:

{'A': array([[10, 15, 20, 25],
        ['fire', 'water', 'earth', 'air']], dtype=object), 
'B': array([[10, 7],
        ['fire', 'air']], dtype=object)}
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74
1

You may use combination of pivot_table and to_dict

dic = (df.pivot_table(columns='name', values=['age','character'], aggfunc=list)
         .to_dict('l'))

Out[107]:
{'A': [[10, 15, 20, 25], ['fire', 'water', 'earth', 'air']],
 'B': [[10, 7], ['fire', 'air']]}

If you dataframe have exact 3 columns name, age, character, you may simply ignore values= parameter

dic = df.pivot_table(columns='name', aggfunc=list).to_dict('l')

As you said in comment, to strip whitespaces, you need to pre-process df with str.strip before calling pivot_table as follows

df.update(df.select_dtypes('object').apply(lambda x: x.str.strip()))
dic = df.pivot_table(columns='name', aggfunc=list).to_dict('l')
Andy L.
  • 24,909
  • 4
  • 17
  • 29