4

I have a pandas dataframe as below:

df=pd.DataFrame({'a':['red','yellow','blue'], 'b':[0,0,1], 'c':[0,1,0], 'd':[1,0,0]})
df

which looks like

    a       b   c   d
0   red     0   0   1
1   yellow  0   1   0
2   blue    1   0   0

I want to convert it to a dictionary so that I get:

red     d
yellow  c
blue    b

The dataset if quite large, so please avoid any iterative method. I haven't figured out a solution yet. Any help is appreciated.

singh
  • 41
  • 1
  • 3

6 Answers6

9

First of all, if you really want to convert this to a dictionary, it's a little nicer to convert the value you want as a key into the index of the DataFrame:

df.set_index('a', inplace=True)

This looks like:

        b  c  d
a              
red     0  0  1
yellow  0  1  0
blue    1  0  0

Your data appears to be in "one-hot" encoding. You first have to reverse that, using the method detailed here:

series = df.idxmax(axis=1)

This looks like:

a
red       d
yellow    c
blue      b
dtype: object

Almost there! Now and use to_dict on the 'value' column (this is where setting column a as the index helps out):

series.to_dict()

This looks like:

{'blue': 'b', 'red': 'd', 'yellow': 'c'}

Which I think is what you are looking for. As a one-liner:

df.set_index('a').idxmax(axis=1).to_dict()
PaSTE
  • 4,050
  • 18
  • 26
2

You can try this.

df = df.set_index('a')
df.where(df > 0).stack().reset_index().drop(0, axis=1)


    a   level_1
0   red     d
1   yellow  c
2   blue    b
Tai
  • 7,684
  • 3
  • 29
  • 49
1

You need dot and zip here

dict(zip(df.a,df.iloc[:,1:].dot(df.iloc[:,1:].columns)))
Out[508]: {'blue': 'b', 'red': 'd', 'yellow': 'c'}
BENY
  • 317,841
  • 20
  • 164
  • 234
0

Hope this works:

import pandas as pd
df=pd.DataFrame({'a':['red','yellow','blue'], 'b':[0,0,1], 'c':[0,1,0], 'd':[1,0,0]})

df['e'] = df.iloc[:,1:].idxmax(axis = 1).reset_index()['index']

newdf = df[["a","e"]]

print (newdf.to_dict(orient='index'))

Output:

{0: {'a': 'red', 'e': 'd'}, 1: {'a': 'yellow', 'e': 'c'}, 2: {'a': 'blue', 'e': 'b'}}
Bhushan Pant
  • 1,445
  • 2
  • 13
  • 29
0

You can convert your dataframe to dict using pandas to_dict with list as argument. Then iterate over this resulting dict and fetch column label whose value is 1.

>>> {k:df.columns[1:][v.index(1)] for k,v in df.set_index('a').T.to_dict('list').items()}
>>> {'yellow': 'c', 'blue': 'b', 'red': 'd'}
Sohaib Farooqi
  • 5,457
  • 4
  • 31
  • 43
0

set column a as index then looking at the rows of df find the index of value one, then convert the resulting series to dictionary using to_dict

here is the code

df.set_index('a').apply(lambda row:row[row==1].index[0],axis=1).to_dict()

alternatively set the index to a then use argmax to find the index of the max value in each row then use to_dict to convert to dictionary

df.set_index('a').apply(lambda row:row.argmax(),axis=1).to_dict()

In both cases, the result will be

{'blue': 'b', 'red': 'd', 'yellow': 'c'}

Ps. I used apply to iterate through the rows of df by setting axis=1

sgDysregulation
  • 4,309
  • 2
  • 23
  • 31