My dataframe consist of two columns includes records of commodity ids and their correspond transactions date over a ten year period, something like below table.
I want to find total count of commodities which are sold twice within a relatively short period, let's say over the 30-day periods, during these ten years. In other word I want to know how many duplicates we have for commudity_id in the 30-day periods during these ten years.
transaction_date Commudity_id
0 2010-01-01 512624
1 2010-01-01 499817
2 2010-01-01 388958
3 2010-01-01 708544
4 2010-01-01 227012
. . .
. . .
. . .
I tried to use pivot table like below. But the output it is not my answer.
dups_goods_id = df.pivot_table(index['transaction_date','commudity_id'],aggfunc='size')
print (dups_goods_id)
I am looking for something like this:
30_days_dups_count = 2387