I read several asked questions on stackoverflow and internet in general (like https://stackoverflow.com/questions/44087637/pandas-how-does-indexslice-work,https://stackoverflow.com/questions/60000477/select-last-row-from-each-column-of-multi-index-pandas-dataframe-based-on-time, https://pandas.pydata.org/docs/user_guide/advanced.html#advanced-shown-levels). But none helped.
My database: I have a dataframe with several sensors, each sensor has several timestamp based entries and a value. I resampled the dataframe based on the sensor´s id in 60S intervals and calculated the a new shift value to see, if the sensor value goes up or down.
My goal: I want to remove all sensors (whole sensor data) if the most recent value went down for the sensor in question and keep all other sensors.
My problem: I can't access the timestamps in the 2nd column of the multiindex based on the position. It seems easy and obvious at first, but i fail.
My current approach: I tried using
timestamp = datetime.datetime.now() - datetime.timedelta(minutes=1)
df_resampled.loc[idx[:, timestamp], 'change']
This works for some sensors, but not all have recent data. Even -2 or -3 can skip a few sensors. A iloc and -1 for the last dataset on 2nd level would be the way to go, but doesn't work with idx and will throw errors.
My solution would be using
df_resampled.index.get_level_values(0)
which looks like
Index(['id_0342', 'id_0342', 'id_0342', 'id_0342', 'id_0342', 'id_0342', 'id_0342',
'id_0342', 'id_0342', 'id_0342',
...
'id_1349', 'id_1349', 'id_1349', 'id_1349', 'id_1349', 'id_1349', 'id_1349',
'id_1349', 'id_1349', 'id_1349'],
dtype='object', name='sensor_id', length=7237)
to get a list of all rows with all sensor names and clear that list from duplicates where first duplicates become False and only last ones True. But this doesn´t seem very pythonic.
The dataframe should look / transform like this:
After loading from database:
sensor_id timestamp value
0 id_8032 2022-10-26 20:27:13 15.160
1 id_8253 2022-10-26 20:27:13 58.750
2 id_7819 2022-10-26 20:28:12 4.196
3 id_7904 2022-10-26 20:28:12 13.556
4 id_8041 2022-10-26 20:28:12 31.740
5 id_8409 2022-10-26 20:28:22 35.350
6 id_7711 2022-10-26 20:28:42 1.146
7 id_8081 2022-10-26 20:29:30 17.530
8 id_8048 2022-10-26 20:29:31 23.650
9 id_8071 2022-10-26 20:29:31 6.890
...
After resampling:
value value_diff
sensor_id timestamp
id_7711 2022-10-26 22:25:00 24.7 0.0
2022-10-26 22:26:00 24.7 0.0
2022-10-26 22:27:00 24.7 0.0
2022-10-26 22:28:00 24.7 0.0
2022-10-26 22:29:00 24.7 0.0
...
id_8409 2022-10-26 22:25:00 9.264 -0.0045
2022-10-26 22:26:00 9.268 0.0000
2022-10-26 22:27:00 9.269 0.0025
2022-10-26 22:28:00 9.272 0.0070
2022-10-26 22:29:00 9.275 0.0100
...
id_8071 2022-10-26 22:26:00 35.166667 0.066667
2022-10-26 22:27:00 35.200000 0.100000
2022-10-26 22:28:00 35.200000 0.100000
2022-10-26 22:29:00 35.200000 0.100000
2022-10-26 22:30:00 35.200000 0.066667
This is what i want in the end (remove all sensors like id_7711, where last row of the group has a difference of 0 or less):
value value_diff
sensor_id timestamp
id_8409 2022-10-26 22:25:00 9.264 -0.0045
2022-10-26 22:26:00 9.268 0.0000
2022-10-26 22:27:00 9.269 0.0025
2022-10-26 22:28:00 9.272 0.0070
2022-10-26 22:29:00 9.275 0.0100
...
id_8071 2022-10-26 22:26:00 35.166667 0.066667
2022-10-26 22:27:00 35.200000 0.100000
2022-10-26 22:28:00 35.200000 0.100000
2022-10-26 22:29:00 35.200000 0.100000
2022-10-26 22:30:00 35.200000 0.066667