From Pandas series, create dictionary with unique elements as keys, and their indices as values

Question

The goal is to create a dictionary from a pandas column (series) where the keys are the unique elements of the column, and the values are the row indices in which the elements occur. I currently have code that accomplishes this, but I'm wondering if there is a simpler and less hacky way to do it:

df = pd.DataFrame(np.random.randint(0,100,size=(1000, 4)), columns=list('ABCD'))
idx = df['A'].reset_index().groupby('A')['index'].apply(tuple).to_dict()

I think that line is pretty neat if you ask me. Didn't you work on something similar here? https://stackoverflow.com/questions/49011261/fastest-way-to-combine-two-slices-from-two-pandas-dataframes-in-a-loop/49053835#49053835 — Celius Stingher, Jan 23 '20 at 15:47
Yes, and I've been using this method since then, but I'm wondering if there's any more pythonic way. — cherrytomato967, Jan 23 '20 at 15:50
@ALollz This is what I've been looking for! Thank you! I've never heard or seen ".groups" before! — cherrytomato967, Jan 23 '20 at 16:07
Yes, it's a very uncommon thing. The attribute is hidden at the very bottom of the documentation: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Grouper.html#pandas.Grouper, with no real description of what it does. — ALollz, Jan 23 '20 at 16:09

score 4 · Accepted Answer · answered Jan 23 '20 at 16:08

This is the groups attribute of a GroupBy object. It returns a dict with unique values as the keys and Index objects of the Original DataFrame.

df.groupby('A').groups

{0: Int64Index([61, 466, 505, 619, 697, 811, 872], dtype='int64'),
 1: Int64Index([125, 254, 278, 330, 390, 396, 670, 732, 748, 849, 871, 880, 882,
                908, 943], dtype='int64'),
 2: Int64Index([77, 283, 401, 543, 544, 693, 816], dtype='int64'),
 ...}

Or if you really need the tuples:

{k: tuple(v) for k,v in df.groupby('A').groups.items()}

score 1 · Answer 2 · answered Jan 23 '20 at 15:54

1

You can do

d = {x : y['index'].tolist() for x , y in df.reset_index().groupby(list(df))}

answered Jan 23 '20 at 15:54

BENY

317,841
20
164
234

From Pandas series, create dictionary with unique elements as keys, and their indices as values

2 Answers2