Pandas MultiIndex slicing using IndexSlice and iloc instead of loc

Question

I read several asked questions on stackoverflow and internet in general (like https://stackoverflow.com/questions/44087637/pandas-how-does-indexslice-work,https://stackoverflow.com/questions/60000477/select-last-row-from-each-column-of-multi-index-pandas-dataframe-based-on-time, https://pandas.pydata.org/docs/user_guide/advanced.html#advanced-shown-levels). But none helped.

My database: I have a dataframe with several sensors, each sensor has several timestamp based entries and a value. I resampled the dataframe based on the sensor´s id in 60S intervals and calculated the a new shift value to see, if the sensor value goes up or down.

My goal: I want to remove all sensors (whole sensor data) if the most recent value went down for the sensor in question and keep all other sensors.

My problem: I can't access the timestamps in the 2nd column of the multiindex based on the position. It seems easy and obvious at first, but i fail.

My current approach: I tried using

timestamp = datetime.datetime.now() - datetime.timedelta(minutes=1)
df_resampled.loc[idx[:, timestamp], 'change']

This works for some sensors, but not all have recent data. Even -2 or -3 can skip a few sensors. A iloc and -1 for the last dataset on 2nd level would be the way to go, but doesn't work with idx and will throw errors.

My solution would be using

df_resampled.index.get_level_values(0)

which looks like

Index(['id_0342', 'id_0342', 'id_0342', 'id_0342', 'id_0342', 'id_0342', 'id_0342',
       'id_0342', 'id_0342', 'id_0342',
       ...
       'id_1349', 'id_1349', 'id_1349', 'id_1349', 'id_1349', 'id_1349', 'id_1349',
       'id_1349', 'id_1349', 'id_1349'],
      dtype='object', name='sensor_id', length=7237)

to get a list of all rows with all sensor names and clear that list from duplicates where first duplicates become False and only last ones True. But this doesn´t seem very pythonic.

The dataframe should look / transform like this:

After loading from database:

    sensor_id   timestamp               value
0   id_8032     2022-10-26 20:27:13     15.160
1   id_8253     2022-10-26 20:27:13     58.750
2   id_7819     2022-10-26 20:28:12     4.196
3   id_7904     2022-10-26 20:28:12     13.556
4   id_8041     2022-10-26 20:28:12     31.740
5   id_8409     2022-10-26 20:28:22     35.350
6   id_7711     2022-10-26 20:28:42     1.146
7   id_8081     2022-10-26 20:29:30     17.530
8   id_8048     2022-10-26 20:29:31     23.650
9   id_8071     2022-10-26 20:29:31     6.890
...

After resampling:

                                    value       value_diff
sensor_id   timestamp           
id_7711     2022-10-26 22:25:00     24.7        0.0
            2022-10-26 22:26:00     24.7        0.0
            2022-10-26 22:27:00     24.7        0.0
            2022-10-26 22:28:00     24.7        0.0
            2022-10-26 22:29:00     24.7        0.0
...
id_8409     2022-10-26 22:25:00     9.264       -0.0045
            2022-10-26 22:26:00     9.268       0.0000
            2022-10-26 22:27:00     9.269       0.0025
            2022-10-26 22:28:00     9.272       0.0070
            2022-10-26 22:29:00     9.275       0.0100
...
id_8071     2022-10-26 22:26:00     35.166667   0.066667
            2022-10-26 22:27:00     35.200000   0.100000
            2022-10-26 22:28:00     35.200000   0.100000
            2022-10-26 22:29:00     35.200000   0.100000
            2022-10-26 22:30:00     35.200000   0.066667

This is what i want in the end (remove all sensors like id_7711, where last row of the group has a difference of 0 or less):

                                    value       value_diff
sensor_id   timestamp           
id_8409     2022-10-26 22:25:00     9.264       -0.0045
            2022-10-26 22:26:00     9.268       0.0000
            2022-10-26 22:27:00     9.269       0.0025
            2022-10-26 22:28:00     9.272       0.0070
            2022-10-26 22:29:00     9.275       0.0100
...
id_8071     2022-10-26 22:26:00     35.166667   0.066667
            2022-10-26 22:27:00     35.200000   0.100000
            2022-10-26 22:28:00     35.200000   0.100000
            2022-10-26 22:29:00     35.200000   0.100000
            2022-10-26 22:30:00     35.200000   0.066667

Kindly provide sample dataframe with expected output dataframe — sammywemmy, Oct 26 '22 at 07:28
Is this problem too obvious or is there no obvious solution? — ngn16920, Nov 02 '22 at 19:31

Pandas MultiIndex slicing using IndexSlice and iloc instead of loc

0 Answers0