1

I have a pandas dataframe df, with the columns user and product. It describes which user buys which products, accounting for repeated purchases of the same product. E.g. if user 1 buys product 23 three times, df will contain the entry 23 three times for user 1. For every user, I am interested in only those products that are bought more than three times by that user. Hence, I do s = df.groupby('user').product.value_counts(), and then I filter s = s[s>2], to discard the products not bought frequently enough. Then, s looks something like this:

user     product
3        39190         9
         47766         8
         21903         8
6        21903         5
         38293         5
11       8309          7
         27959         7
         14947         5
         35948         4
         8670          4

Having filtered the data, I am not interested in the frequencies (the right column) any more.

How can I create a dict of the form user:product based on s? I have trouble accessing the individual columns/index of the Series.

DominikS
  • 231
  • 3
  • 12

1 Answers1

2

Option 0

s.reset_index().groupby('user').product.apply(list).to_dict()

{3: [39190, 47766, 21903],
 6: [21903, 38293],
 11: [8309, 27959, 14947, 35948, 8670]}

Option 1

s.groupby(level='user').apply(lambda x: x.loc[x.name].index.tolist()).to_dict()

{3: [39190, 47766, 21903],
 6: [21903, 38293],
 11: [8309, 27959, 14947, 35948, 8670]}

Option 2

from collections import defaultdict

d = defaultdict(list)

[d[x].append(y) for x, y in s.index.values];

dict(d)

{3: [39190, 47766, 21903],
 6: [21903, 38293],
 11: [8309, 27959, 14947, 35948, 8670]}
piRSquared
  • 285,575
  • 57
  • 475
  • 624
  • Thanks, that solves it! In option 0 I had to provide a new column name in reset_index(), otherwise I get a naming error (same as described [here](https://stackoverflow.com/questions/39778686/pandas-reset-index-after-groupby-value-counts)). – DominikS Jul 14 '17 at 21:32