0

I have a Pandas DataFrame, df, with the following columns:

user_id, date_of_visit

I want to find all user_ids who visited at least twice in any 4 day window. So if I had:

user_id, date_of_visit

1, 1/4/15

1, 1/6/15

2, 1/4/15

2, 1/12/15

2, 1/23/15

Then my function would return [1]

I could just go through each row, but I'm wondering if there's a better way to leverage Pandas. Maybe something with:

df.groupby('user_id')...?

Thanks!

anon_swe
  • 8,791
  • 24
  • 85
  • 145
  • 2
    Can you make your example [reproducible](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples)? – cmaher Apr 24 '18 at 16:27

1 Answers1

1

You can do with rolling within filter

#df.date_of_visit=pd.to_datetime(df.date_of_visit)

df.groupby('user_id').filter(lambda x : x.set_index('date_of_visit').rolling('4d').count().gt(1).any()).user_id.unique()
Out[525]: array([1], dtype=int64)
BENY
  • 317,841
  • 20
  • 164
  • 234
  • Thanks for the help. I used your approach but get `TypeError: filter function returned a Series, but expected a scalar bool` . Happy to provide a reproducible example if helpful – anon_swe Apr 24 '18 at 19:35