184

I have a dataframe with two columns and I intend to convert it to a dictionary. The first column will be the key and the second will be the value.

Dataframe:

    id    value
0    0     10.2
1    1      5.7
2    2      7.4

How can I do this?

cottontail
  • 10,268
  • 18
  • 50
  • 51
perigee
  • 9,438
  • 11
  • 31
  • 35

21 Answers21

387

If lakes is your DataFrame, you can do something like

area_dict = dict(zip(lakes.id, lakes.value))
TylerH
  • 20,799
  • 66
  • 75
  • 101
punchagan
  • 5,566
  • 1
  • 19
  • 23
  • 42
    Solution: `area_dict = dict(zip(lakes['id'], lakes['value']))` – jezrael Nov 30 '15 at 08:56
  • 2
    What if you wanted more than one column to be the in the dictionary values? I am thinking something like `area_dict = dict(zip(lakes.area, (lakes.count, lakes.other_column)))`. How would you make this happen? – jesseaam Mar 28 '19 at 16:09
  • 6
    If the second argument has multiple values, this won't work. – pnv Apr 25 '19 at 05:47
213

See the docs for to_dict. You can use it like this:

df.set_index('id').to_dict()

And if you have only one column, to avoid the column name is also a level in the dict (actually, in this case you use the Series.to_dict()):

df.set_index('id')['value'].to_dict()
joris
  • 133,120
  • 36
  • 247
  • 202
  • 26
    Note that this command will lose data if there redundant values in the ID columns: `>>> ptest = p.DataFrame([['a',1],['a',2],['b',3]], columns=['id', 'value']) >>> ptest.set_index('id')['value'].to_dict()` – dalloliogm Jun 23 '14 at 11:32
  • 12
    I have to say, there is nothing in that docs link that would have given me the answer to this question. – Ben Fulton May 11 '17 at 16:00
93
mydict = dict(zip(df.id, df.value))
praful gupta
  • 1,031
  • 7
  • 2
  • 4
    Note: in case the index is the desired dictionary key, then do: dict(zip(df.index,df.value)) – aLbAc Feb 19 '18 at 12:03
66

If you want a simple way to preserve duplicates, you could use groupby:

>>> ptest = pd.DataFrame([['a',1],['a',2],['b',3]], columns=['id', 'value']) 
>>> ptest
  id  value
0  a      1
1  a      2
2  b      3
>>> {k: g["value"].tolist() for k,g in ptest.groupby("id")}
{'a': [1, 2], 'b': [3]}
DSM
  • 342,061
  • 65
  • 592
  • 494
  • 2
    Nice and elegant solution, but on a 50k rows table, it is about 6 times slower than my ugly solution below. – dalloliogm Jun 24 '14 at 10:44
  • @dalloliogm: could you give an example table that happens for? If it's six times slower than a Python loop, there might be a performance bug in pandas. – DSM Jun 24 '14 at 11:42
33

The answers by joris in this thread and by punchagan in the duplicated thread are very elegant, however they will not give correct results if the column used for the keys contains any duplicated value.

For example:

>>> ptest = p.DataFrame([['a',1],['a',2],['b',3]], columns=['id', 'value']) 
>>> ptest
  id  value
0  a      1
1  a      2
2  b      3

# note that in both cases the association a->1 is lost:
>>> ptest.set_index('id')['value'].to_dict()
{'a': 2, 'b': 3}
>>> dict(zip(ptest.id, ptest.value))
{'a': 2, 'b': 3}

If you have duplicated entries and do not want to lose them, you can use this ugly but working code:

>>> mydict = {}
>>> for x in range(len(ptest)):
...     currentid = ptest.iloc[x,0]
...     currentvalue = ptest.iloc[x,1]
...     mydict.setdefault(currentid, [])
...     mydict[currentid].append(currentvalue)
>>> mydict
{'a': [1, 2], 'b': [3]}
Community
  • 1
  • 1
dalloliogm
  • 8,718
  • 6
  • 45
  • 55
  • 3
    Excuse the formatting due to the lack of a block in comments: `mydict = defaultdict(list)\n for (key, val) in ptest[["id", "value"]].itertuples(index=False):\n mydict[key].append(val)` – Midnighter Jun 23 '14 at 14:45
15

Here is what I think is the simplest solution:

df.set_index('id').T.to_dict('records')

Example:

df= pd.DataFrame([['a',1],['a',2],['b',3]], columns=['id','value'])
df.set_index('id').T.to_dict('records')

If you have multiple values, like val1, val2, val3, etc., and you want them as lists, then use the below code:

df.set_index('id').T.to_dict('list')

Read more about records from above here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_dict.html

TylerH
  • 20,799
  • 66
  • 75
  • 101
Gil Baggio
  • 13,019
  • 3
  • 48
  • 37
13

With pandas it can be done as:

If lakes is your DataFrame:

area_dict = lakes.to_dict('records')
Rahul Agarwal
  • 4,034
  • 7
  • 27
  • 51
AnandSin
  • 319
  • 2
  • 3
  • 1
    there is no 'records' column in given example. Also in such case the index will be the key, which not what we want to. – Michael D Jun 13 '18 at 12:45
  • 20
    @MichaelD 'records' is not a column. It's an option for the argument `orient`. – Zheng Liu Nov 28 '18 at 09:36
  • 12
    This will actually output a list of dictionaries in the following format: [{'area': 10, 'count': 7}, {'area': 20, 'count': 5}...] instead of a key->value dict. – Roei Bahumi Dec 09 '20 at 12:07
13

You can use 'dict comprehension'

my_dict = {row[0]: row[1] for row in df.values}
Dongwan Kim
  • 131
  • 1
  • 3
  • 1
    Looping with pandas isn't the most efficient in terms of memory usage. See: https://engineering.upside.com/a-beginners-guide-to-optimizing-pandas-code-for-speed-c09ef2c6a4d6 – tda Sep 14 '18 at 07:13
9

in some versions the code below might not work

mydict = dict(zip(df.id, df.value))

so make it explicit

id_=df.id.values
value=df.value.values
mydict=dict(zip(id_,value))

Note i used id_ because the word id is reserved word

Vincent Appiah
  • 101
  • 1
  • 1
  • Agree, it did not work for me. But how can you do `df.id`, the column name `id` is not recognized as a data frame variable, right? As in, a variable written into the data frame object library. I must be misunderstanding something. – Azurespot Jul 15 '21 at 00:52
6

Here is an example for converting a dataframe with three columns A, B, and C (let's say A and B are the geographical coordinates of longitude and latitude and C the country region/state/etc., which is more or less the case).

I want a dictionary with each pair of A,B values (dictionary key) matching the value of C (dictionary value) in the corresponding row (each pair of A,B values is guaranteed to be unique due to previous filtering, but it is possible to have the same value of C for different pairs of A,B values in this context), so I would do:

mydict = dict(zip(zip(df['A'],df['B']), df['C']))

Using pandas to_dict() also works:

mydict = df.set_index(['A','B']).to_dict(orient='dict')['C']

(none of the columns A or B are used as an index before executing the line creating the dictionary)

Both approaches are fast (less than one second on a dataframe with 85k rows on a ~2015 fast dual-core laptop).

TylerH
  • 20,799
  • 66
  • 75
  • 101
  • What is a "fast dual-core laptop"? That line would be better removed or replaced with a specific laptop and CPU model. Let us decide for ourselves if it is "fast". – TylerH Mar 21 '22 at 13:45
4

Another (slightly shorter) solution for not losing duplicate entries:

>>> ptest = pd.DataFrame([['a',1],['a',2],['b',3]], columns=['id','value'])
>>> ptest
  id  value
0  a      1
1  a      2
2  b      3

>>> pdict = dict()
>>> for i in ptest['id'].unique().tolist():
...     ptest_slice = ptest[ptest['id'] == i]
...     pdict[i] = ptest_slice['value'].tolist()
...

>>> pdict
{'b': [3], 'a': [1, 2]}
user1376377
  • 125
  • 1
  • 1
  • 5
  • Please read [answer] and always remember that you are not merely solving the problem at hand, but also educating the OP and any future readers of this question and answer. Thus, please [edit] the answer to include an explanation as to why it works. – Adriaan Mar 21 '22 at 13:54
3

You can also do this if you want to play around with pandas. However, I like punchagan's way.

# replicating your dataframe
lake = pd.DataFrame({'co tp': ['DE Lake', 'Forest', 'FR Lake', 'Forest'], 
                 'area': [10, 20, 30, 40], 
                 'count': [7, 5, 2, 3]})
lake.set_index('co tp', inplace=True)

# to get key value using pandas
area_dict = lake.set_index('area').T.to_dict('records')[0]
print(area_dict)

output: {10: 7, 20: 5, 30: 2, 40: 3}
Samlex
  • 161
  • 3
  • 8
3

If 'lakes' is your DataFrame, you can also do something like:

# Your dataframe
lakes = pd.DataFrame({'co tp': ['DE Lake', 'Forest', 'FR Lake', 'Forest'], 
                 'area': [10, 20, 30, 40], 
                 'count': [7, 5, 2, 3]})
lakes.set_index('co tp', inplace=True)

My solution:

area_dict = lakes.set_index("area")["count"].to_dict()

or @punchagan 's solution (which I prefer)

area_dict = dict(zip(lakes.area, lakes.count))

Both should work.

Allan
  • 321
  • 1
  • 8
  • Please read [answer] and always remember that you are not merely solving the problem at hand, but also educating the OP and any future readers of this question and answer. Thus, please [edit] the answer to include an explanation as to why it works. – Adriaan Mar 21 '22 at 13:54
2

you need this it

area_dict = lakes.to_dict(orient='records')
Heeda
  • 29
  • 1
1

You need a list as a dictionary value. This code will do the trick.

from collections import defaultdict
mydict = defaultdict(list)
for k, v in zip(df.id.values,df.value.values):
    mydict[k].append(v)
Dmitry
  • 31
  • 3
1

If you set the the index than the dictionary will result in unique key value pairs

encoder=LabelEncoder()
df['airline_enc']=encoder.fit_transform(df['airline'])
dictAirline= df[['airline_enc','airline']].set_index('airline_enc').to_dict()
Golden Lion
  • 3,840
  • 2
  • 26
  • 35
1

Many answers here use dict(zip(...)) syntax. It's also possible without zip.

mydict = dict(df.values)                        # {0.0: 10.2, 1.0: 5.7, 2.0: 7.4}
# or for faster code, convert to a list
mydict = dict(df.values.tolist())               # {0.0: 10.2, 1.0: 5.7, 2.0: 7.4}

If one column is int and the other is float as in the OP, then cast to object dtype and call dict().

mydict = dict(df.astype('O').values)            # {0: 10.2, 1: 5.7, 2: 7.4}
mydict = dict(df.astype('O').values.tolist())   # {0: 10.2, 1: 5.7, 2: 7.4}

If the index is meant to be the keys, it's even simpler.

mydict = df['value'].to_dict()                  # {0: 10.2, 1: 5.7, 2: 7.4}
cottontail
  • 10,268
  • 18
  • 50
  • 51
1

Edit:

Same result could be reached by the following:

filter_list = df[df.Col.isin(criteria)][['Col1','Col2']].values.tolist()

Original Post:

I had a similar issue, where I was looking to filter a dataframe into a resulting list of lists.

This was my solution:

filter_df = df[df.Col.isin(criteria)][['Col1','Col2']]
filter_list = filter_df.to_dict(orient='tight')
filter_list = filter_list['data']

Result: list of lists

Source: pandas.DataFrame.to_dict

hitmikey
  • 23
  • 6
0

If there exists some duplicate values in the value columns and if we want to keep the duplicate values in the dictionary

below code could help

df = pd.DataFrame([['a',1],['a',2],['a',4],['b',3],['b',4],['c',5]], columns=['id', 'value'])

df.groupby('id')['value'].apply(list).to_dict()

output : {'a': [1, 2, 4], 'b': [3, 4], 'c': [5]}
Anil Kumar
  • 385
  • 2
  • 17
-1
def get_dict_from_pd(df, key_col, row_col):
    result = dict()
    for i in set(df[key_col].values):
        is_i = df[key_col] == i
        result[i] = list(df[is_i][row_col].values)
    return result

This is my solution; a basic loop.

TylerH
  • 20,799
  • 66
  • 75
  • 101
SummersKing
  • 301
  • 1
  • 11
  • Please read [answer] and always remember that you are not merely solving the problem at hand, but also educating the OP and any future readers of this question and answer. Thus, please [edit] the answer to include an explanation as to why it works. – Adriaan Mar 21 '22 at 13:53
-1

This is my solution:

import pandas as pd
df = pd.read_excel('dic.xlsx')
df_T = df.set_index('id').T
dic = df_T.to_dict('records')
print(dic)
Hamoon
  • 1
  • Please read [answer] and always remember that you are not merely solving the problem at hand, but also educating the OP and any future readers of this question and answer. Thus, please [edit] the answer to include an explanation as to why it works. – Adriaan Mar 21 '22 at 13:53