7

I have a dataframe like this

df = pd.DataFrame(columns = ['A', 'B'])
df.A = [1,1,1,2,2,2,2,4,4,5]
df.B = [5,2,4,3,1,5,4,1,2,2]

What I'm currently using

d = {}
for i in df.A:
    d[i] = []
    for v in df.A[df.A == i].index:
        d[i].append(df.B[v])

Resulting in

{1: [5, 2, 4], 2: [3, 1, 5, 4], 4: [1, 2], 5: [2]}

But it's slow.

What is a pythonic way of doing this?

EDIT:

d = {}
for i in df.A.unique():
    d[i] = df[df.A == i].B.tolist()

Still seems like there must be a faster way

Thanks for any help!

David Feldman
  • 349
  • 1
  • 3
  • 10

3 Answers3

16

You can use a DataFrame's groupby and to_dict methods which'll keep all the heavy work done in pandas, and not Python loops, eg:

import pandas as pd
​
df = pd.DataFrame(columns = ['A', 'B'])
df.A = [1,1,1,2,2,2,2,4,4,5]
df.B = [5,2,4,3,1,5,4,1,2,2]
​
d = df.groupby('A')['B'].apply(list).to_dict()

Gives you:

{1: [5, 2, 4], 2: [3, 1, 5, 4], 4: [1, 2], 5: [2]}
Jon Clements
  • 138,671
  • 33
  • 247
  • 280
  • I was looking for exactly the same and I'm shocked by how simple is that! So, first we `groupby` by a column with future keys. In the grouped `df` we choose the second column (which contains all, non-aggregated values) and apply `list` function to that. Am I right? – Soren V. Raben Jun 30 '23 at 14:57
3

look ad this: list to dictionary conversion with multiple values per key?

from collections import defaultdict
d = defaultdict(list)
for i, j in zip(df.A,df.B):
    d[i].append(j)

if this ok?

EDIT: If you want, you can convert it to simple dict:

d = dict(d)
Community
  • 1
  • 1
2

To create a simple dictionary using two list in python you write (there are variations)

mydict = dict(zip(list1, list2)) #assumes len(list1) ==  len(list2)

Where zip() is a python built-in that takes one item from each list at the same position and returns a list of tuples. By casting those tuples with the dict() method you can create a dictionary, where list1 provides the dictionary keys and list2 provides the values. Thus, both list need to have the same length, since the zip method will iterate over the provided lists. You can also use izip(), which can be found in the itertools module. izip() will return an iterator instead of a list. While they are both used the same way, depending the size of the list, it is more memory efficient to use izip() because izip() will return one iterable object at a time instead of loading the entire list into memory. That being said, when you use a dictionary all it's contents are loaded into memory, so that searching for keys and values is quick. (sorry for the tangent).

reticentroot
  • 3,612
  • 2
  • 22
  • 39