1

I am combining a bunch of different datasets to create an aggregation to analyse in 15 minute intervals.

The currently dataframe I have looks like this,

<bound method NDFrame.to_clipboard of                        id                       user_id  sentiment  magnitude  \
2020-10-04 14:06:00  10.0  cPL1Fg7BqRXvSFKeU1mJT7KCCTq2       -0.1        0.1   
2020-10-04 14:06:05  11.0  cPL1Fg7BqRXvSFKeU1mJT7KCCTq2       -0.8        0.8   
2020-10-05 12:28:58  12.0  cPL1Fg7BqRXvSFKeU1mJT7KCCTq2       -0.2        0.2   
2020-10-05 12:29:16  13.0  cPL1Fg7BqRXvSFKeU1mJT7KCCTq2       -0.2        0.2   
2020-10-05 12:29:31  14.0  cPL1Fg7BqRXvSFKeU1mJT7KCCTq2        0.2        0.2   

                     angry  disgusted  fearful  happy  neutral  sad  \
2020-10-04 14:06:00    NaN        NaN      NaN    NaN      NaN  NaN   
2020-10-04 14:06:05    NaN        NaN      NaN    NaN      NaN  NaN   
2020-10-05 12:28:58    NaN        NaN      NaN    NaN      NaN  NaN   
2020-10-05 12:29:16    NaN        NaN      NaN    NaN      NaN  NaN   
2020-10-05 12:29:31    NaN        NaN      NaN    NaN      NaN  NaN   

                     surprised  heartRate  steps  
2020-10-04 14:06:00        NaN        NaN    NaN  
2020-10-04 14:06:05        NaN        NaN    NaN  
2020-10-05 12:28:58        NaN        NaN    NaN  
2020-10-05 12:29:16        NaN        NaN    NaN  
2020-10-05 12:29:31        NaN        NaN    NaN  >

I want to aggregate the dataframe into 15 minute intervals.

I think groupby is the best approach? But I'm finding it hard to get it to work particularly well : /

Thanks in advance,

LeCoda
  • 538
  • 7
  • 36
  • 79
  • Look at `DataFrame.resample` https://stackoverflow.com/questions/42191697/resample-daily-data-to-monthly-with-pandas-date-formatting – Erfan Apr 25 '21 at 10:55

1 Answers1

1

There are two options, either we can use resample or pd.Grouper(which is performant).

Let me share example of pd.Grouper to add column values for 15 mins interval.

Code

pd.DataFrame(df.groupby([pd.Grouper(key='date', freq='15Min')]).sum()).reset_index()

Input sample from your data

    date                 id
0   2020-10-04 14:06:00 10.0
1   2020-10-04 14:06:05 11.0
2   2020-10-05 12:28:58 12.0
3   2020-10-05 12:29:16 13.0
4   2020-10-05 12:29:31 14.0

Output

    date           id
0   2020-10-04 14:00:00 21.0
1   2020-10-04 14:15:00 0.0
2   2020-10-04 14:30:00 0.0
3   2020-10-04 14:45:00 0.0
4   2020-10-04 15:00:00 0.0
Utsav
  • 5,572
  • 2
  • 29
  • 43