0

Using Python & Pandas I'm trying to create a count of occurrences for each location utilizing a 7 day window. For simplicity, I'm looking at the first 2 weeks of June 2020,

enter image description here

And I'm trying to aggregate these rows into a count of occurances in a 7 day window, returning results like

date location_id count
2020-06-01  D29139  3
2020-06-08  D29139  1
2020-06-01  D40091  2
2020-06-08  D40091  2

Right now, I have the following code,

#Ensure it's a date format and then create the index to be date
df['date'] = pd.to_datetime(df['date'], format='%Y-%m-%d')
df = df.set_index('date', drop = True)

#Group the location IDs and utilizing the index (date), create a rolling count for a 7 day window
week_windows = df.groupby('location_id').rolling('7D').count()

What I'm receiving (and don't want) seems to be an indexing of the count,

enter image description here

Grygger
  • 83
  • 6
  • Looks like you would need call `groupby()` with both date and location_id. Can you post sample data in copy/paste form (e.g., https://stackoverflow.com/a/30424537/13608599) – jsmart Jul 30 '20 at 15:49

2 Answers2

0

I assume that date column is of datetime type.

Unfortunately, rolling operates only on numeric columns, so you should generate the result other way.

Use grouping by:

  • location_id,
  • date with Week frequency.

Then:

  • For each group, count the number of elements in an arbitrary column.
  • To generate just the wanted ordering of levels of the created MultiIndex, swap them. For the time being, the result is a Series.
  • And finally, to get a DataFrame, reset the index, renaming the count column.

The code to do it is:

result = df.groupby(['location_id', pd.Grouper(key='date', freq='W-MON', closed='left',
    label='left')]).location_id.apply(lambda grp: grp.count()).swaplevel()\
    .reset_index(name='count')

The result is:

        date location_id  count
0 2020-06-01      D29139      3
1 2020-06-08      D29139      1
2 2020-06-01      D40091      2
3 2020-06-08      D40091      2
Valdi_Bo
  • 30,023
  • 4
  • 23
  • 41
0

Looks like your requirement is resample() not rolling()

a = np.array(np.meshgrid(pd.date_range(dt.datetime(2020,7,1), dt.datetime(2020,7,29), freq='D'),
                     ['D29139', 'D40091', 'D29139'])).reshape(2,-1)
df = pd.DataFrame({"date":a[0], "location_id":a[1]})
df["date"] = pd.to_datetime(df["date"])
df = df.set_index("date")
print(df[df.index>=dt.datetime(2020,7,29)].to_string())
print(df.groupby("location_id").resample('7D').count().to_string())

output

           location_id
date                  
2020-07-29      D29139
2020-07-29      D40091
2020-07-29      D29139
                        location_id
location_id date                   
D29139      2020-07-01           14
            2020-07-08           14
            2020-07-15           14
            2020-07-22           14
            2020-07-29            2
D40091      2020-07-01            7
            2020-07-08            7
            2020-07-15            7
            2020-07-22            7
            2020-07-29            1
Rob Raymond
  • 29,118
  • 3
  • 14
  • 30