1

I have a pandas dataframe containing dates of when a customer enters a shop. I'm looking for a method that will allow me to count the number of times a customer has visited a shop in the past month from the current Date_Visited including the current visit.

So, for a minimal dataset below

Customer_ID     Date_Visited (Year-Month-Day)
1               2020-07-10
2               2020-07-09
1               2020-01-01               
2               2020-07-08
1               2020-07-08
3               2020-07-01

I'm looking for an output of

Customer_ID     Date_Visited    visit_times
1               2020-07-10      2
2               2020-07-09      2
1               2020-01-01      1         
2               2020-07-08      1 
1               2020-07-08      1
3               2020-07-01      1

I've been able to use a solution involving loops - but this would be inefficient for large dataframes.

I've thought about trying to merge two copies of the dataframe and use a similar approach to that in Counting Number of Occurrences Between Dates (Given an ID value) From Another Dataframe but I'm not sure if this method is the best way to approach this problem

Oli
  • 13
  • 2

1 Answers1

1

You can group by Customer_ID and year/month (using pandas.Grouper on the sorted dataframe (pandas.DataFrame.sort_values using the date column as key) and apply a cumcount per group (you need to add 1 as the count starts from 0 in python):

df['visit_times'] = (df.sort_values(by='Date_Visited (Year-Month-Day)')
                       .groupby(['Customer_ID',
                                 pd.Grouper(freq='M', key='Date_Visited (Year-Month-Day)')
                                ])
                       .cumcount()+1
                               )

output:

   Customer_ID Date_Visited (Year-Month-Day)  visit_times
0            1                    2020-07-10            2
1            2                    2020-07-09            2
2            1                    2020-01-01            1
3            2                    2020-07-08            1
4            1                    2020-07-08            1
5            3                    2020-07-01            1
mozway
  • 194,879
  • 13
  • 39
  • 75
  • I have learnt `pd.Grouper` today: Thank you! I have added it to my list of favorite functions. I would give +300 points if I could. – Prayson W. Daniel Jul 27 '21 at 14:49
  • @PraysonW.Daniel Glad it helped. You can always [set a bounty](https://stackoverflow.com/help/bounty) on this question in 48 hours if you really want, "reward existing answer" is [one of the reasons](https://stackoverflow.blog/2011/09/23/bounty-reasons-and-post-notices/) ;) – mozway Jul 27 '21 at 14:54
  • It am not sure how that works as I am not the question author – Prayson W. Daniel Jul 27 '21 at 14:57
  • I have also never done it, but according to the help, there should be a "start a bounty" link below the question when the question is at least 2 days old. Don't feel like you have to do it though ;) – mozway Jul 27 '21 at 15:03