I am fairly new to this so please bear with me. I have a df where the index is in datetime format. My other columns are concentration and a count column that just consists of 1s.
Timestamp | Concentration | Count |
---|---|---|
2018-01-01 08:07:00 | 32.675305 | 1 |
2018-01-01 08:20:00 | 22.816844 | 1 |
2018-01-01 08:28:00 | 17.183438 | 1 |
2018-01-01 08:37:00 | 18.591789 | 1 |
I want to clean up the df by only including data where there are at least 3 concentration values recorded in the hour.
I tried resampling by hour and then getting a sum of the count column which shows me if the number of data points per hour meets the threshold of 3. And then I can get rid of rows where count is less than 3.
df2 = df.resample('H').sum()
df3 = df2[~(df2['Count'] < 3)]
From here, though, the concentrations have also been summed, which eventually I don't want. I'm wondering if there is a way to go back to before I resampled but without the purged data?
Is there another way to do this that would work better?