convert a pandas dataframe to dictionary

Question

I have a pandas dataframe as below:

df=pd.DataFrame({'a':['red','yellow','blue'], 'b':[0,0,1], 'c':[0,1,0], 'd':[1,0,0]})
df

which looks like

    a       b   c   d
0   red     0   0   1
1   yellow  0   1   0
2   blue    1   0   0

I want to convert it to a dictionary so that I get:

red     d
yellow  c
blue    b

The dataset if quite large, so please avoid any iterative method. I haven't figured out a solution yet. Any help is appreciated.

Possible duplicate of [Convert a Pandas DataFrame to a dictionary](https://stackoverflow.com/questions/26716616/convert-a-pandas-dataframe-to-a-dictionary) — Vivek Kalyanarangan, Feb 04 '18 at 05:36
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_dict.html subset your data and then do `to_dict` which is available off the shelf with `pandas` — Vivek Kalyanarangan, Feb 04 '18 at 05:36

score 9 · Answer 1 · answered Feb 04 '18 at 05:51

First of all, if you really want to convert this to a dictionary, it's a little nicer to convert the value you want as a key into the index of the DataFrame:

df.set_index('a', inplace=True)

This looks like:

        b  c  d
a              
red     0  0  1
yellow  0  1  0
blue    1  0  0

Your data appears to be in "one-hot" encoding. You first have to reverse that, using the method detailed here:

series = df.idxmax(axis=1)

This looks like:

a
red       d
yellow    c
blue      b
dtype: object

Almost there! Now and use to_dict on the 'value' column (this is where setting column a as the index helps out):

series.to_dict()

This looks like:

{'blue': 'b', 'red': 'd', 'yellow': 'c'}

Which I think is what you are looking for. As a one-liner:

df.set_index('a').idxmax(axis=1).to_dict()

Good explanation. I like the simple steps you took – James Schinner Feb 04 '18 at 05:55 — James Schinner, Feb 04 '18 at 05:55

Tai · Answer 2 · 2018-02-04T06:03:57.957

2

You can try this.

df = df.set_index('a')
df.where(df > 0).stack().reset_index().drop(0, axis=1)


    a   level_1
0   red     d
1   yellow  c
2   blue    b

edited Feb 04 '18 at 06:03

answered Feb 04 '18 at 05:48

Tai

7,684
3
29
49

score 1 · Answer 3 · answered Feb 04 '18 at 07:04

1

You need dot and zip here

dict(zip(df.a,df.iloc[:,1:].dot(df.iloc[:,1:].columns)))
Out[508]: {'blue': 'b', 'red': 'd', 'yellow': 'c'}

answered Feb 04 '18 at 07:04

BENY

317,841
20
164
234

1

Perhaps simply `df.set_index('a').dot(df.columns[1:]).to_dict()` – Bharath M Shetty Feb 04 '18 at 12:36

Bhushan Pant · Answer 4 · 2018-02-04T05:45:40.237

0

Hope this works:

import pandas as pd
df=pd.DataFrame({'a':['red','yellow','blue'], 'b':[0,0,1], 'c':[0,1,0], 'd':[1,0,0]})

df['e'] = df.iloc[:,1:].idxmax(axis = 1).reset_index()['index']

newdf = df[["a","e"]]

print (newdf.to_dict(orient='index'))

Output:

{0: {'a': 'red', 'e': 'd'}, 1: {'a': 'yellow', 'e': 'c'}, 2: {'a': 'blue', 'e': 'b'}}

edited Feb 04 '18 at 05:45

answered Feb 04 '18 at 05:39

Bhushan Pant

1,445
2
13
29

Yes, I am using python 2.7 – Bhushan Pant Feb 04 '18 at 05:41
It's tagged `3.x`. The output doesn't look like what OP wanted. – James Schinner Feb 04 '18 at 05:42
Seems, I forgot to use the axis column. And I checked it for python3 also, working fine. – Bhushan Pant Feb 04 '18 at 05:46
@bhushan , thanks for the answer , but the output is not correct .. i want a different format – singh Feb 04 '18 at 05:50

Sohaib Farooqi · Answer 5 · 2018-02-04T06:02:37.110

0

You can convert your dataframe to dict using pandas to_dict with list as argument. Then iterate over this resulting dict and fetch column label whose value is 1.

>>> {k:df.columns[1:][v.index(1)] for k,v in df.set_index('a').T.to_dict('list').items()}
>>> {'yellow': 'c', 'blue': 'b', 'red': 'd'}

edited Feb 04 '18 at 06:02

answered Feb 04 '18 at 05:51

Sohaib Farooqi

5,457
4
31
43

Thanks for the solution, but it is an iterative one and is slow for my large dataset. – singh Feb 04 '18 at 08:09

sgDysregulation · Answer 6 · 2018-02-04T14:37:09.183

set column a as index then looking at the rows of df find the index of value one, then convert the resulting series to dictionary using to_dict

here is the code

df.set_index('a').apply(lambda row:row[row==1].index[0],axis=1).to_dict()

alternatively set the index to a then use argmax to find the index of the max value in each row then use to_dict to convert to dictionary

df.set_index('a').apply(lambda row:row.argmax(),axis=1).to_dict()

In both cases, the result will be

{'blue': 'b', 'red': 'd', 'yellow': 'c'}

Ps. I used apply to iterate through the rows of df by setting axis=1

convert a pandas dataframe to dictionary

6 Answers6

Linked