Convert a Pandas DataFrame to a dictionary

Question

I have a DataFrame with four columns. I want to convert this DataFrame to a python dictionary. I want the elements of first column be keys and the elements of other columns in the same row be values.

DataFrame:

    ID   A   B   C
0   p    1   3   2
1   q    4   3   2
2   r    4   0   9

Output should be like this:

{'p': [1,3,2], 'q': [4,3,2], 'r': [4,0,9]}

`Dataframe.to_dict()` will make `A,B,C` the keys instead of `p,q,r` — Prince Bhatti, Nov 03 '14 at 14:56
@jezrael how to get the following output? {2:{'p': [1,3]},2:{'q': [4,3]},9:{'r': [4,0]}} for the same dataset? — panda, Apr 22 '19 at 09:10
@jezrael column equivalents of the above question {'c':{'ID': 'A','B'}} — panda, Apr 22 '19 at 09:13
This question tackles one column only. https://stackoverflow.com/questions/18695605/python-pandas-dataframe-to-dictionary — S.Doe_Dude, Feb 20 '21 at 00:17

Alex Riley · Accepted Answer · 2017-12-17T16:59:20.730

The to_dict() method sets the column names as dictionary keys so you'll need to reshape your DataFrame slightly. Setting the 'ID' column as the index and then transposing the DataFrame is one way to achieve this.

to_dict() also accepts an 'orient' argument which you'll need in order to output a list of values for each column. Otherwise, a dictionary of the form {index: value} will be returned for each column.

These steps can be done with the following line:

>>> df.set_index('ID').T.to_dict('list')
{'p': [1, 3, 2], 'q': [4, 3, 2], 'r': [4, 0, 9]}

In case a different dictionary format is needed, here are examples of the possible orient arguments. Consider the following simple DataFrame:

>>> df = pd.DataFrame({'a': ['red', 'yellow', 'blue'], 'b': [0.5, 0.25, 0.125]})
>>> df
        a      b
0     red  0.500
1  yellow  0.250
2    blue  0.125

Then the options are as follows.

dict - the default: column names are keys, values are dictionaries of index:data pairs

>>> df.to_dict('dict')
{'a': {0: 'red', 1: 'yellow', 2: 'blue'}, 
 'b': {0: 0.5, 1: 0.25, 2: 0.125}}

list - keys are column names, values are lists of column data

>>> df.to_dict('list')
{'a': ['red', 'yellow', 'blue'], 
 'b': [0.5, 0.25, 0.125]}

series - like 'list', but values are Series

>>> df.to_dict('series')
{'a': 0       red
      1    yellow
      2      blue
      Name: a, dtype: object, 

 'b': 0    0.500
      1    0.250
      2    0.125
      Name: b, dtype: float64}

split - splits columns/data/index as keys with values being column names, data values by row and index labels respectively

>>> df.to_dict('split')
{'columns': ['a', 'b'],
 'data': [['red', 0.5], ['yellow', 0.25], ['blue', 0.125]],
 'index': [0, 1, 2]}

records - each row becomes a dictionary where key is column name and value is the data in the cell

>>> df.to_dict('records')
[{'a': 'red', 'b': 0.5}, 
 {'a': 'yellow', 'b': 0.25}, 
 {'a': 'blue', 'b': 0.125}]

index - like 'records', but a dictionary of dictionaries with keys as index labels (rather than a list)

>>> df.to_dict('index')
{0: {'a': 'red', 'b': 0.5},
 1: {'a': 'yellow', 'b': 0.25},
 2: {'a': 'blue', 'b': 0.125}}

this will be one liner: `df.set_index('ID').T.to_dict('list')` — Anzel, Nov 03 '14 at 14:58
`df.to_dict('records')` is how most modern software would want a dict that matches a `json` like pattern — GavinBelson, Aug 25 '20 at 22:32
yet here someone needs a alternation of 'index' method, one where its value of the dictionary is a pd.Series, meaning just breaking down the dataframe to a dictionaries of series, which can be achieved using `dict(list(df.iterrows()))` Enjoy! — George Pligoropoulos, Oct 10 '22 at 14:43
sorry if this is a silly question, is there a best one for sharing (e.g. in giving an excerpt for a SE question)? Or is is there a 'best practice' for choosing which to use? thanks — Abdul-Kareem Abdul-Rahman, Dec 05 '22 at 15:22

score 162 · Answer 2 · edited Sep 08 '21 at 23:31

162

Should a dictionary like:

{'red': '0.500', 'yellow': '0.250', 'blue': '0.125'}

be required out of a dataframe like:

        a      b
0     red  0.500
1  yellow  0.250
2    blue  0.125

simplest way would be to do:

dict(df.values)

working snippet below:

import pandas as pd
df = pd.DataFrame({'a': ['red', 'yellow', 'blue'], 'b': [0.5, 0.25, 0.125]})
dict(df.values)

edited Sep 08 '21 at 23:31

ddejohn

8,775
3
17
30

answered May 05 '20 at 13:52

Muhammad Moiz Ahmed

1,799
1
11
9

13

Neat ! It only works for two-columns dataframes, though. – Skippy le Grand Gourou Feb 02 '21 at 14:32

score 37 · Answer 3 · answered Jun 26 '16 at 21:28

Follow these steps:

Suppose your dataframe is as follows:

>>> df
   A  B  C ID
0  1  3  2  p
1  4  3  2  q
2  4  0  9  r

1. Use `set_index` to set `ID` columns as the dataframe index.

    df.set_index("ID", drop=True, inplace=True)

2. Use the `orient=index` parameter to have the index as dictionary keys.

    dictionary = df.to_dict(orient="index")

The results will be as follows:

    >>> dictionary
    {'q': {'A': 4, 'B': 3, 'D': 2}, 'p': {'A': 1, 'B': 3, 'D': 2}, 'r': {'A': 4, 'B': 0, 'D': 9}}

3. If you need to have each sample as a list run the following code. Determine the column order

column_order= ["A", "B", "C"] #  Determine your preferred order of columns
d = {} #  Initialize the new dictionary as an empty dictionary
for k in dictionary:
    d[k] = [dictionary[k][column_name] for column_name in column_order]

For the last bit seems you'd be simpler off using a dict comprehension to replace the for loop + list comprehension (3 lines -> 1). Either way, although it's nice to have options, the top answer is a lot shorter. — fantabolous, Nov 11 '16 at 01:02
This on is handy because it explains clearly how to use a specific column or header as the index. — Tropicalrambler, Feb 04 '20 at 05:34

score 33 · Answer 4 · answered Nov 03 '14 at 14:55

33

Try to use Zip

df = pd.read_csv("file")
d= dict([(i,[a,b,c ]) for i, a,b,c in zip(df.ID, df.A,df.B,df.C)])
print d

Output:

{'p': [1, 3, 2], 'q': [4, 3, 2], 'r': [4, 0, 9]}

answered Nov 03 '14 at 14:55

score 13 · Answer 5 · answered Jun 26 '16 at 21:41

13

If you don't mind the dictionary values being tuples, you can use itertuples:

>>> {x[0]: x[1:] for x in df.itertuples(index=False)}
{'p': (1, 3, 2), 'q': (4, 3, 2), 'r': (4, 0, 9)}

answered Jun 26 '16 at 21:41

Kamil Sindi

21,782
19
96
120

Victoria Stuart · Answer 6 · 2019-07-16T02:06:49.600

For my use (node names with xy positions) I found @user4179775's answer to the most helpful / intuitive:

import pandas as pd

df = pd.read_csv('glycolysis_nodes_xy.tsv', sep='\t')

df.head()
    nodes    x    y
0  c00033  146  958
1  c00031  601  195
...

xy_dict_list=dict([(i,[a,b]) for i, a,b in zip(df.nodes, df.x,df.y)])

xy_dict_list
{'c00022': [483, 868],
 'c00024': [146, 868],
 ... }

xy_dict_tuples=dict([(i,(a,b)) for i, a,b in zip(df.nodes, df.x,df.y)])

xy_dict_tuples
{'c00022': (483, 868),
 'c00024': (146, 868),
 ... }

Addendum

I later returned to this issue, for other, but related, work. Here is an approach that more closely mirrors the [excellent] accepted answer.

node_df = pd.read_csv('node_prop-glycolysis_tca-from_pg.tsv', sep='\t')

node_df.head()
   node  kegg_id kegg_cid            name  wt  vis
0  22    22       c00022   pyruvate        1   1
1  24    24       c00024   acetyl-CoA      1   1
...

Convert Pandas dataframe to a [list], {dict}, {dict of {dict}}, ...

Per accepted answer:

node_df.set_index('kegg_cid').T.to_dict('list')

{'c00022': [22, 22, 'pyruvate', 1, 1],
 'c00024': [24, 24, 'acetyl-CoA', 1, 1],
 ... }

node_df.set_index('kegg_cid').T.to_dict('dict')

{'c00022': {'kegg_id': 22, 'name': 'pyruvate', 'node': 22, 'vis': 1, 'wt': 1},
 'c00024': {'kegg_id': 24, 'name': 'acetyl-CoA', 'node': 24, 'vis': 1, 'wt': 1},
 ... }

In my case, I wanted to do the same thing but with selected columns from the Pandas dataframe, so I needed to slice the columns. There are two approaches.

Directly:

(see: Convert pandas to dictionary defining the columns used fo the key values)

node_df.set_index('kegg_cid')[['name', 'wt', 'vis']].T.to_dict('dict')

{'c00022': {'name': 'pyruvate', 'vis': 1, 'wt': 1},
 'c00024': {'name': 'acetyl-CoA', 'vis': 1, 'wt': 1},
 ... }

"Indirectly:" first, slice the desired columns/data from the Pandas dataframe (again, two approaches),

node_df_sliced = node_df[['kegg_cid', 'name', 'wt', 'vis']]

or

node_df_sliced2 = node_df.loc[:, ['kegg_cid', 'name', 'wt', 'vis']]

that can then can be used to create a dictionary of dictionaries

node_df_sliced.set_index('kegg_cid').T.to_dict('dict')

{'c00022': {'name': 'pyruvate', 'vis': 1, 'wt': 1},
 'c00024': {'name': 'acetyl-CoA', 'vis': 1, 'wt': 1},
 ... }

The issue with this approach is that the Dataframe columns must be unique or they'll be omitted. Example would be if you have `c00022` for more than one row. — Graciela Carrillo, Sep 28 '21 at 15:23

score 4 · Answer 7 · answered Jan 03 '22 at 01:33

Most of the answers do not deal with the situation where ID can exist multiple times in the dataframe. In case ID can be duplicated in the Dataframe df you want to use a list to store the values (a.k.a a list of lists), grouped by ID:

{k: [g['A'].tolist(), g['B'].tolist(), g['C'].tolist()] for k,g in df.groupby('ID')}

score 2 · Answer 8 · answered Nov 07 '21 at 07:25

2

Dictionary comprehension & iterrows() method could also be used to get the desired output.

result = {row.ID: [row.A, row.B, row.C] for (index, row) in df.iterrows()}

answered Nov 07 '21 at 07:25

melon

31
2

score 1 · Answer 9 · answered Feb 22 '21 at 08:28

1

df = pd.DataFrame([['p',1,3,2], ['q',4,3,2], ['r',4,0,9]], columns=['ID','A','B','C'])
my_dict = {k:list(v) for k,v in zip(df['ID'], df.drop(columns='ID').values)}
print(my_dict)

with output

{'p': [1, 3, 2], 'q': [4, 3, 2], 'r': [4, 0, 9]}

answered Feb 22 '21 at 08:28

shahar_m

3,461
5
41
61

score 0 · Answer 10 · answered Jul 30 '21 at 16:22

0

With this method, columns of dataframe will be the keys and series of dataframe will be the values.`

data_dict = dict()
for col in dataframe.columns:
    data_dict[col] = dataframe[col].values.tolist()

answered Jul 30 '21 at 16:22

Adeel Afzal

191
1
4

score 0 · Answer 11 · answered Mar 22 '23 at 20:19

If IDs are unique

set_index().T.to_dict() is a very simple syntax but transposing a dataframe is really slow. dict(zip(...)) syntax is about 70 times faster. For example, for a frame with 1mil rows, dict(zip(...)) takes less than 0.5sec while set_index.T.to_dict takes over 30sec.

mydict = dict(zip(df['ID'], df.set_index('ID').values.tolist()))

Also a dict comprehension with itertuples is really fast (suggested by Kamil Sindi) but about 3 times slower than dict(zip(...)).

mydict = {x[0]: list(x[1:]) for x in df.itertuples(index=False)}

If IDs are duplicated

itertuples is especially useful if ID column has duplicate values. It's much faster to loop through the dataframe via itertuples and construct a dict using dict.setdefault than groupby (which was suggested by Ka Wa Yip) or iterrows. For example, for a dataframe with 100k rows and 60k unique IDs, itertuples is 250 times faster than groupby.¹

mydict = {}
for row in df.itertuples(index=False):
    mydict.setdefault(row[0], []).append(list(row[1:]))

Performance benchmarks:

As the runtime plot shows, dict(zip(...)) and itertuples with dict.setdefault are much faster than their more "pandas" counterparts no matter how large the frames are.

Code used to produce the above plots:

import numpy as np
from perfplot import plot

plot(
    setup=lambda n: pd.DataFrame({'ID': np.arange(n)}).join(pd.DataFrame(np.random.default_rng().choice(10, size=(n, 3)), columns=[*'ABC'])),
    kernels=[lambda df: dict(zip(df['ID'], df.set_index('ID').values.tolist())), 
             lambda df: df.set_index('ID').T.to_dict('list'), 
             lambda df: {x[0]: list(x[1:]) for x in df.itertuples(index=False)}],
    labels= ["dict(zip(df['ID'], df.set_index('ID').values.tolist()))", 
             "df.set_index('ID').T.to_dict('list')", 
             "{x[0]: list(x[1:]) for x in df.itertuples(index=False)}"],
    n_range=[2**k for k in range(18)],
    xlabel='Number of rows',
    title='Unique IDs',
    equality_check=lambda x,y: x==y);


def itertuples_(df):
    mydict = {}
    for row in df.itertuples(index=False):
        mydict.setdefault(row[0], []).append(list(row[1:]))
    return mydict
        
def groupby_(df):
    return {k: g[['A', 'B', 'C']].values.tolist() for k, g in df.groupby('ID')}

plot(
    setup=lambda n: pd.DataFrame(np.random.default_rng().choice(n, size=(n, 4)), columns=['ID','A','B','C']),
    kernels=[itertuples_, groupby_],
    labels= ["itertuples", "groupby"],
    n_range=[2**k for k in range(17)],
    xlabel="Number of rows",
    title="Duplicated IDs",
    equality_check=lambda x,y: x==y);

score -1 · Answer 12 · answered Oct 12 '18 at 14:56

-1

DataFrame.to_dict() converts DataFrame to dictionary.

Example

>>> df = pd.DataFrame(
    {'col1': [1, 2], 'col2': [0.5, 0.75]}, index=['a', 'b'])
>>> df
   col1  col2
a     1   0.1
b     2   0.2
>>> df.to_dict()
{'col1': {'a': 1, 'b': 2}, 'col2': {'a': 0.5, 'b': 0.75}}

See this Documentation for details

answered Oct 12 '18 at 14:56

Umer

1,098
13
31

3

Yes, but the OP explicitl stated they want the row indexes to be the keys, not the column labels. – Vicki B Aug 03 '19 at 16:13

Convert a Pandas DataFrame to a dictionary

12 Answers12

Follow these steps:

1. Use `set_index` to set `ID` columns as the dataframe index.

2. Use the `orient=index` parameter to have the index as dictionary keys.

3. If you need to have each sample as a list run the following code. Determine the column order

If IDs are unique

If IDs are duplicated

Performance benchmarks:

Linked

Related

Convert a Pandas DataFrame to a dictionary

12 Answers12

Follow these steps:

1. Use set_index to set ID columns as the dataframe index.

2. Use the orient=index parameter to have the index as dictionary keys.

3. If you need to have each sample as a list run the following code. Determine the column order

If IDs are unique

If IDs are duplicated

Performance benchmarks:

Linked

Related

1. Use `set_index` to set `ID` columns as the dataframe index.

2. Use the `orient=index` parameter to have the index as dictionary keys.