Pandas - Rolling count grouping 7 day windows by location IDs

Question

Using Python & Pandas I'm trying to create a count of occurrences for each location utilizing a 7 day window. For simplicity, I'm looking at the first 2 weeks of June 2020,

And I'm trying to aggregate these rows into a count of occurances in a 7 day window, returning results like

Right now, I have the following code,

#Ensure it's a date format and then create the index to be date
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d')
df = df.set_index('date', drop = True)

#Group the location IDs and utilizing the index (date), create a rolling count for a 7 day window
week_windows = df.groupby('location_id').rolling('7D').count()

What I'm receiving (and don't want) seems to be an indexing of the count,

Looks like you would need call `groupby()` with both date and location_id. Can you post sample data in copy/paste form (e.g., https://stackoverflow.com/a/30424537/13608599) — jsmart, Jul 30 '20 at 15:49

Valdi_Bo · Answer 1 · 2020-07-30T19:29:38.700

I assume that date column is of datetime type.

Unfortunately, rolling operates only on numeric columns, so you should generate the result other way.

Use grouping by:

location_id,
date with Week frequency.

Then:

For each group, count the number of elements in an arbitrary column.
To generate just the wanted ordering of levels of the created MultiIndex, swap them. For the time being, the result is a Series.
And finally, to get a DataFrame, reset the index, renaming the count column.

The code to do it is:

result = df.groupby(['location_id', pd.Grouper(key='date', freq='W-MON', closed='left',
    label='left')]).location_id.apply(lambda grp: grp.count()).swaplevel()\
    .reset_index(name='count')

The result is:

        date location_id  count
0 2020-06-01      D29139      3
1 2020-06-08      D29139      1
2 2020-06-01      D40091      2
3 2020-06-08      D40091      2

score 0 · Answer 2 · answered Jul 30 '20 at 19:37

Looks like your requirement is resample() not rolling()

a = np.array(np.meshgrid(pd.date_range(dt.datetime(2020,7,1), dt.datetime(2020,7,29), freq='D'),
                     ['D29139', 'D40091', 'D29139'])).reshape(2,-1)
df = pd.DataFrame({"date":a[0], "location_id":a[1]})
df["date"] = pd.to_datetime(df["date"])
df = df.set_index("date")
print(df[df.index>=dt.datetime(2020,7,29)].to_string())
print(df.groupby("location_id").resample('7D').count().to_string())

output

           location_id
date                  
2020-07-29      D29139
2020-07-29      D40091
2020-07-29      D29139
                        location_id
location_id date                   
D29139      2020-07-01           14
            2020-07-08           14
            2020-07-15           14
            2020-07-22           14
            2020-07-29            2
D40091      2020-07-01            7
            2020-07-08            7
            2020-07-15            7
            2020-07-22            7
            2020-07-29            1

Pandas - Rolling count grouping 7 day windows by location IDs

2 Answers2