How to create a dictionary of lists from two columns in a dataframe

Question

I have a dataframe like this

df = pd.DataFrame(columns = ['A', 'B'])
df.A = [1,1,1,2,2,2,2,4,4,5]
df.B = [5,2,4,3,1,5,4,1,2,2]

What I'm currently using

d = {}
for i in df.A:
    d[i] = []
    for v in df.A[df.A == i].index:
        d[i].append(df.B[v])

Resulting in

{1: [5, 2, 4], 2: [3, 1, 5, 4], 4: [1, 2], 5: [2]}

But it's slow.

What is a pythonic way of doing this?

EDIT:

d = {}
for i in df.A.unique():
    d[i] = df[df.A == i].B.tolist()

Still seems like there must be a faster way

Thanks for any help!

Jon Clements · Answer 1 · 2015-03-10T01:37:36.237

16

You can use a DataFrame's groupby and to_dict methods which'll keep all the heavy work done in pandas, and not Python loops, eg:

import pandas as pd

df = pd.DataFrame(columns = ['A', 'B'])
df.A = [1,1,1,2,2,2,2,4,4,5]
df.B = [5,2,4,3,1,5,4,1,2,2]

d = df.groupby('A')['B'].apply(list).to_dict()

Gives you:

{1: [5, 2, 4], 2: [3, 1, 5, 4], 4: [1, 2], 5: [2]}

edited Mar 10 '15 at 01:37

answered Mar 10 '15 at 01:30

Jon Clements

138,671
33
247
280

I was looking for exactly the same and I'm shocked by how simple is that! So, first we `groupby` by a column with future keys. In the grouped `df` we choose the second column (which contains all, non-aggregated values) and apply `list` function to that. Am I right? – Soren V. Raben Jun 30 '23 at 14:57

score 3 · Accepted Answer · edited May 23 '17 at 12:32

3

look ad this: list to dictionary conversion with multiple values per key?

from collections import defaultdict
d = defaultdict(list)
for i, j in zip(df.A,df.B):
    d[i].append(j)

if this ok?

EDIT: If you want, you can convert it to simple dict:

d = dict(d)

edited May 23 '17 at 12:32

Community

1
1

answered Mar 10 '15 at 01:14

Marcin Fabrykowski

619
3
8

reticentroot · Answer 3 · 2015-03-10T05:28:36.677

To create a simple dictionary using two list in python you write (there are variations)

mydict = dict(zip(list1, list2)) #assumes len(list1) ==  len(list2)

Where zip() is a python built-in that takes one item from each list at the same position and returns a list of tuples. By casting those tuples with the dict() method you can create a dictionary, where list1 provides the dictionary keys and list2 provides the values. Thus, both list need to have the same length, since the zip method will iterate over the provided lists. You can also use izip(), which can be found in the itertools module. izip() will return an iterator instead of a list. While they are both used the same way, depending the size of the list, it is more memory efficient to use izip() because izip() will return one iterable object at a time instead of loading the entire list into memory. That being said, when you use a dictionary all it's contents are loaded into memory, so that searching for keys and values is quick. (sorry for the tangent).

An explanation of how your code works and why it answers the question would be helpful. — skrrgwasme, Mar 10 '15 at 04:43

How to create a dictionary of lists from two columns in a dataframe

3 Answers3