4

The goal is to create a dictionary from a pandas column (series) where the keys are the unique elements of the column, and the values are the row indices in which the elements occur. I currently have code that accomplishes this, but I'm wondering if there is a simpler and less hacky way to do it:

df = pd.DataFrame(np.random.randint(0,100,size=(1000, 4)), columns=list('ABCD'))
idx = df['A'].reset_index().groupby('A')['index'].apply(tuple).to_dict()
  • 1
    I think that line is pretty neat if you ask me. Didn't you work on something similar here? https://stackoverflow.com/questions/49011261/fastest-way-to-combine-two-slices-from-two-pandas-dataframes-in-a-loop/49053835#49053835 – Celius Stingher Jan 23 '20 at 15:47
  • Yes, and I've been using this method since then, but I'm wondering if there's any more pythonic way. – cherrytomato967 Jan 23 '20 at 15:50
  • 1
    If @Jezrael can't do it, I quit coding. – Celius Stingher Jan 23 '20 at 15:52
  • I am going to test but I think this the best way – ansev Jan 23 '20 at 15:54
  • could you use a Series instead tuple or list? – ansev Jan 23 '20 at 16:01
  • @ALollz This is what I've been looking for! Thank you! I've never heard or seen ".groups" before! – cherrytomato967 Jan 23 '20 at 16:07
  • Yes, it's a very uncommon thing. The attribute is hidden at the very bottom of the documentation: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Grouper.html#pandas.Grouper, with no real description of what it does. – ALollz Jan 23 '20 at 16:09

2 Answers2

4

This is the groups attribute of a GroupBy object. It returns a dict with unique values as the keys and Index objects of the Original DataFrame.

df.groupby('A').groups

{0: Int64Index([61, 466, 505, 619, 697, 811, 872], dtype='int64'),
 1: Int64Index([125, 254, 278, 330, 390, 396, 670, 732, 748, 849, 871, 880, 882,
                908, 943], dtype='int64'),
 2: Int64Index([77, 283, 401, 543, 544, 693, 816], dtype='int64'),
 ...}

Or if you really need the tuples:

{k: tuple(v) for k,v in df.groupby('A').groups.items()}
ALollz
  • 57,915
  • 7
  • 66
  • 89
1

You can do

d = {x : y['index'].tolist() for x , y in df.reset_index().groupby(list(df))}
BENY
  • 317,841
  • 20
  • 164
  • 234