How to convert a dataframe to a dictionary

Question

I have a dataframe with two columns and I intend to convert it to a dictionary. The first column will be the key and the second will be the value.

Dataframe:

    id    value
0    0     10.2
1    1      5.7
2    2      7.4

How can I do this?

score 387 · Answer 1 · edited Mar 21 '22 at 13:59

387

If lakes is your DataFrame, you can do something like

area_dict = dict(zip(lakes.id, lakes.value))

edited Mar 21 '22 at 13:59

TylerH

20,799
66
75
101

answered Aug 02 '13 at 09:42

punchagan

5,566
1
19
23

42

Solution: `area_dict = dict(zip(lakes['id'], lakes['value']))` – jezrael Nov 30 '15 at 08:56
2

What if you wanted more than one column to be the in the dictionary values? I am thinking something like `area_dict = dict(zip(lakes.area, (lakes.count, lakes.other_column)))`. How would you make this happen? – jesseaam Mar 28 '19 at 16:09
6

If the second argument has multiple values, this won't work. – pnv Apr 25 '19 at 05:47

joris · Answer 2 · 2016-01-05T22:19:56.620

213

See the docs for to_dict. You can use it like this:

df.set_index('id').to_dict()

And if you have only one column, to avoid the column name is also a level in the dict (actually, in this case you use the Series.to_dict()):

df.set_index('id')['value'].to_dict()

edited Jan 05 '16 at 22:19

answered Sep 09 '13 at 09:55

joris

133,120
36
247
202

26

Note that this command will lose data if there redundant values in the ID columns: `>>> ptest = p.DataFrame([['a',1],['a',2],['b',3]], columns=['id', 'value']) >>> ptest.set_index('id')['value'].to_dict()` – dalloliogm Jun 23 '14 at 11:32
12

I have to say, there is nothing in that docs link that would have given me the answer to this question. – Ben Fulton May 11 '17 at 16:00

score 93 · Answer 3 · answered Oct 03 '16 at 17:41

93

mydict = dict(zip(df.id, df.value))

answered Oct 03 '16 at 17:41

praful gupta

1,031
7
2

4

Note: in case the index is the desired dictionary key, then do: dict(zip(df.index,df.value)) – aLbAc Feb 19 '18 at 12:03

score 66 · Answer 4 · answered Jun 23 '14 at 16:08

66

If you want a simple way to preserve duplicates, you could use groupby:

>>> ptest = pd.DataFrame([['a',1],['a',2],['b',3]], columns=['id', 'value']) 
>>> ptest
  id  value
0  a      1
1  a      2
2  b      3
>>> {k: g["value"].tolist() for k,g in ptest.groupby("id")}
{'a': [1, 2], 'b': [3]}

answered Jun 23 '14 at 16:08

DSM

342,061
65
592
494

2

Nice and elegant solution, but on a 50k rows table, it is about 6 times slower than my ugly solution below. – dalloliogm Jun 24 '14 at 10:44
@dalloliogm: could you give an example table that happens for? If it's six times slower than a Python loop, there might be a performance bug in pandas. – DSM Jun 24 '14 at 11:42

score 33 · Answer 5 · edited May 23 '17 at 10:31

The answers by joris in this thread and by punchagan in the duplicated thread are very elegant, however they will not give correct results if the column used for the keys contains any duplicated value.

For example:

>>> ptest = p.DataFrame([['a',1],['a',2],['b',3]], columns=['id', 'value']) 
>>> ptest
  id  value
0  a      1
1  a      2
2  b      3

# note that in both cases the association a->1 is lost:
>>> ptest.set_index('id')['value'].to_dict()
{'a': 2, 'b': 3}
>>> dict(zip(ptest.id, ptest.value))
{'a': 2, 'b': 3}

If you have duplicated entries and do not want to lose them, you can use this ugly but working code:

>>> mydict = {}
>>> for x in range(len(ptest)):
...     currentid = ptest.iloc[x,0]
...     currentvalue = ptest.iloc[x,1]
...     mydict.setdefault(currentid, [])
...     mydict[currentid].append(currentvalue)
>>> mydict
{'a': [1, 2], 'b': [3]}

Excuse the formatting due to the lack of a block in comments: `mydict = defaultdict(list)\n for (key, val) in ptest[["id", "value"]].itertuples(index=False):\n mydict[key].append(val)` — Midnighter, Jun 23 '14 at 14:45

score 15 · Answer 6 · edited Mar 21 '22 at 13:42

Here is what I think is the simplest solution:

df.set_index('id').T.to_dict('records')

Example:

df= pd.DataFrame([['a',1],['a',2],['b',3]], columns=['id','value'])
df.set_index('id').T.to_dict('records')

If you have multiple values, like val1, val2, val3, etc., and you want them as lists, then use the below code:

df.set_index('id').T.to_dict('list')

Read more about records from above here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_dict.html

score 13 · Answer 7 · edited Sep 25 '18 at 16:03

13

With pandas it can be done as:

If lakes is your DataFrame:

area_dict = lakes.to_dict('records')

edited Sep 25 '18 at 16:03

Rahul Agarwal

4,034
7
27
51

answered Apr 17 '18 at 07:55

AnandSin

319
2
3

1

there is no 'records' column in given example. Also in such case the index will be the key, which not what we want to. – Michael D Jun 13 '18 at 12:45
20

@MichaelD 'records' is not a column. It's an option for the argument `orient`. – Zheng Liu Nov 28 '18 at 09:36
12

This will actually output a list of dictionaries in the following format: [{'area': 10, 'count': 7}, {'area': 20, 'count': 5}...] instead of a key->value dict. – Roei Bahumi Dec 09 '20 at 12:07

score 13 · Answer 8 · answered Sep 14 '18 at 07:06

13

You can use 'dict comprehension'

my_dict = {row[0]: row[1] for row in df.values}

answered Sep 14 '18 at 07:06

Dongwan Kim

131
1
3

1

Looping with pandas isn't the most efficient in terms of memory usage. See: https://engineering.upside.com/a-beginners-guide-to-optimizing-pandas-code-for-speed-c09ef2c6a4d6 – tda Sep 14 '18 at 07:13

score 9 · Answer 9 · answered Jan 05 '18 at 00:16

9

in some versions the code below might not work

mydict = dict(zip(df.id, df.value))

so make it explicit

id_=df.id.values
value=df.value.values
mydict=dict(zip(id_,value))

Note i used id_ because the word id is reserved word

answered Jan 05 '18 at 00:16

Vincent Appiah

101
1
1

Agree, it did not work for me. But how can you do `df.id`, the column name `id` is not recognized as a data frame variable, right? As in, a variable written into the data frame object library. I must be misunderstanding something. – Azurespot Jul 15 '21 at 00:52

score 6 · Answer 10 · edited Mar 21 '22 at 13:44

Here is an example for converting a dataframe with three columns A, B, and C (let's say A and B are the geographical coordinates of longitude and latitude and C the country region/state/etc., which is more or less the case).

I want a dictionary with each pair of A,B values (dictionary key) matching the value of C (dictionary value) in the corresponding row (each pair of A,B values is guaranteed to be unique due to previous filtering, but it is possible to have the same value of C for different pairs of A,B values in this context), so I would do:

mydict = dict(zip(zip(df['A'],df['B']), df['C']))

Using pandas to_dict() also works:

mydict = df.set_index(['A','B']).to_dict(orient='dict')['C']

(none of the columns A or B are used as an index before executing the line creating the dictionary)

Both approaches are fast (less than one second on a dataframe with 85k rows on a ~2015 fast dual-core laptop).

What is a "fast dual-core laptop"? That line would be better removed or replaced with a specific laptop and CPU model. Let us decide for ourselves if it is "fast". — TylerH, Mar 21 '22 at 13:45

score 4 · Answer 11 · answered Oct 23 '17 at 16:29

4

Another (slightly shorter) solution for not losing duplicate entries:

>>> ptest = pd.DataFrame([['a',1],['a',2],['b',3]], columns=['id','value'])
>>> ptest
  id  value
0  a      1
1  a      2
2  b      3

>>> pdict = dict()
>>> for i in ptest['id'].unique().tolist():
...     ptest_slice = ptest[ptest['id'] == i]
...     pdict[i] = ptest_slice['value'].tolist()
...

>>> pdict
{'b': [3], 'a': [1, 2]}

answered Oct 23 '17 at 16:29

user1376377

125
1
1
5

Please read [answer] and always remember that you are not merely solving the problem at hand, but also educating the OP and any future readers of this question and answer. Thus, please [edit] the answer to include an explanation as to why it works. – Adriaan Mar 21 '22 at 13:54

score 3 · Answer 12 · answered Nov 13 '18 at 23:46

You can also do this if you want to play around with pandas. However, I like punchagan's way.

# replicating your dataframe
lake = pd.DataFrame({'co tp': ['DE Lake', 'Forest', 'FR Lake', 'Forest'], 
                 'area': [10, 20, 30, 40], 
                 'count': [7, 5, 2, 3]})
lake.set_index('co tp', inplace=True)

# to get key value using pandas
area_dict = lake.set_index('area').T.to_dict('records')[0]
print(area_dict)

output: {10: 7, 20: 5, 30: 2, 40: 3}

score 3 · Answer 13 · answered May 03 '21 at 02:00

3

If 'lakes' is your DataFrame, you can also do something like:

# Your dataframe
lakes = pd.DataFrame({'co tp': ['DE Lake', 'Forest', 'FR Lake', 'Forest'], 
                 'area': [10, 20, 30, 40], 
                 'count': [7, 5, 2, 3]})
lakes.set_index('co tp', inplace=True)

My solution:

area_dict = lakes.set_index("area")["count"].to_dict()

or @punchagan 's solution (which I prefer)

area_dict = dict(zip(lakes.area, lakes.count))

Both should work.

answered May 03 '21 at 02:00

Allan

321
1
8

Please read [answer] and always remember that you are not merely solving the problem at hand, but also educating the OP and any future readers of this question and answer. Thus, please [edit] the answer to include an explanation as to why it works. – Adriaan Mar 21 '22 at 13:54

score 2 · Answer 14 · answered Jan 14 '22 at 01:27

2

you need this it

area_dict = lakes.to_dict(orient='records')

answered Jan 14 '22 at 01:27

Heeda

29
1

2

Hi, it would be great if you could help us to understand what your code does and how it solves the OP's problem! – Simas Joneliunas Jan 18 '22 at 06:21
1

This just repeats an existing answer by AnandSin from 2018. – TylerH Mar 21 '22 at 13:46

score 1 · Answer 15 · answered Jan 18 '18 at 00:07

1

You need a list as a dictionary value. This code will do the trick.

from collections import defaultdict
mydict = defaultdict(list)
for k, v in zip(df.id.values,df.value.values):
    mydict[k].append(v)

answered Jan 18 '18 at 00:07

Dmitry

31
3

score 1 · Answer 16 · answered Jan 24 '21 at 15:20

1

If you set the the index than the dictionary will result in unique key value pairs

encoder=LabelEncoder()
df['airline_enc']=encoder.fit_transform(df['airline'])
dictAirline= df[['airline_enc','airline']].set_index('airline_enc').to_dict()

answered Jan 24 '21 at 15:20

Golden Lion

3,840
2
26
35

score 1 · Answer 17 · answered Mar 22 '23 at 17:39

Many answers here use dict(zip(...)) syntax. It's also possible without zip.

mydict = dict(df.values)                        # {0.0: 10.2, 1.0: 5.7, 2.0: 7.4}
# or for faster code, convert to a list
mydict = dict(df.values.tolist())               # {0.0: 10.2, 1.0: 5.7, 2.0: 7.4}

If one column is int and the other is float as in the OP, then cast to object dtype and call dict().

mydict = dict(df.astype('O').values)            # {0: 10.2, 1: 5.7, 2: 7.4}
mydict = dict(df.astype('O').values.tolist())   # {0: 10.2, 1: 5.7, 2: 7.4}

If the index is meant to be the keys, it's even simpler.

mydict = df['value'].to_dict()                  # {0: 10.2, 1: 5.7, 2: 7.4}

score 1 · Answer 18 · answered Apr 17 '23 at 16:31

Edit:

Same result could be reached by the following:

filter_list = df[df.Col.isin(criteria)][['Col1','Col2']].values.tolist()

Original Post:

I had a similar issue, where I was looking to filter a dataframe into a resulting list of lists.

This was my solution:

filter_df = df[df.Col.isin(criteria)][['Col1','Col2']]
filter_list = filter_df.to_dict(orient='tight')
filter_list = filter_list['data']

Result: list of lists

Source: pandas.DataFrame.to_dict

Anil Kumar · Answer 19 · 2022-11-23T09:11:56.127

0

If there exists some duplicate values in the value columns and if we want to keep the duplicate values in the dictionary

below code could help

df = pd.DataFrame([['a',1],['a',2],['a',4],['b',3],['b',4],['c',5]], columns=['id', 'value'])

df.groupby('id')['value'].apply(list).to_dict()

output : {'a': [1, 2, 4], 'b': [3, 4], 'c': [5]}

edited Nov 23 '22 at 09:11

answered Nov 23 '22 at 09:04

Anil Kumar

385
2
17

score -1 · Answer 20 · edited Mar 21 '22 at 13:47

-1

def get_dict_from_pd(df, key_col, row_col):
    result = dict()
    for i in set(df[key_col].values):
        is_i = df[key_col] == i
        result[i] = list(df[is_i][row_col].values)
    return result

This is my solution; a basic loop.

edited Mar 21 '22 at 13:47

TylerH

20,799
66
75
101

answered Mar 20 '20 at 02:15

SummersKing

301
1
11

Please read [answer] and always remember that you are not merely solving the problem at hand, but also educating the OP and any future readers of this question and answer. Thus, please [edit] the answer to include an explanation as to why it works. – Adriaan Mar 21 '22 at 13:53

score -1 · Answer 21 · answered Apr 04 '20 at 11:39

-1

This is my solution:

import pandas as pd
df = pd.read_excel('dic.xlsx')
df_T = df.set_index('id').T
dic = df_T.to_dict('records')
print(dic)

answered Apr 04 '20 at 11:39

Hamoon

1

Please read [answer] and always remember that you are not merely solving the problem at hand, but also educating the OP and any future readers of this question and answer. Thus, please [edit] the answer to include an explanation as to why it works. – Adriaan Mar 21 '22 at 13:53

How to convert a dataframe to a dictionary

21 Answers21

My solution:

Linked

Related