Filling NaN values of a df based on different conditions

Question

I have a df like this:

Timestamp                                 Time  Power    Total Energy              ID     Energy
2020-04-09 06:45:00 2020-04-09 04:44:40.559719   7500       5636690.0               1      140.0    
2020-04-09 06:46:00 2020-04-09 04:44:40.559719   7500       5636710.0               1      160.0    
2020-04-09 06:47:00                        NaT    NaN             NaN             NaN        NaN    
2020-04-09 06:48:00 2020-04-09 04:44:40.559719   7500       5636960.0               1      410.0
2020-04-09 06:49:00                        NaT    NaN             NaN             NaN        NaN
2020-04-09 06:50:00                        NaT    NaN             NaN             NaN        NaN
2020-04-09 06:51:00                        NaT    NaN             NaN             NaN        NaN
...                                        ...    ...             ...             ...        ...
2020-04-30 23:55:00 2020-04-29 16:30:38.559871   7500      18569270.0               5      100.0
2020-04-30 23:54:00                        NaT    NaN             NaN             NaN        NaN
2020-04-30 23:55:00 2020-04-29 16:30:38.559871   7500      18569370.0               5      180.0

different cycles (df['ID']) are marked by different IDs. Within a cycle (ID appears both before and after the nan value) the power of the two "surrounding" lines should be averaged, ID and Time should be continued and in the column energy the last existing value of the column energy should be entered. Outside the cycle (ID before != next ID) the power as well as the energy should be set to 0 and the columns ID/Time should be set to '-'. for the column total energy the values should simply be continued.

Expected outcome:

Timestamp                                 Time  Power    Total Energy              ID     Energy
2020-04-09 06:45:00 2020-04-09 04:44:40.559719   7500       5636690.0               1      140.0    
2020-04-09 06:46:00 2020-04-09 04:44:40.559719   7500       5636710.0               1      160.0    
2020-04-09 06:47:00 2020-04-09 04:44:40.559719   7500       5636710.0               1      160.0
2020-04-09 06:48:00 2020-04-09 04:44:40.559719   7500       5636960.0               1      410.0
2020-04-09 06:49:00                          -      0       5636960.0               -          0
2020-04-09 06:50:00                          -      0       5636960.0               -          0
2020-04-09 06:51:00                          -      0       5636960.0               -          0
...                                        ...    ...             ...             ...        ...
2020-04-30 23:55:00 2020-04-29 16:30:38.559871   7500      18569270.0               5      100.0
2020-04-30 23:54:00 2020-04-29 16:30:38.559871   7500      18569270.0               5      100.0
2020-04-30 23:55:00 2020-04-29 16:30:38.559871   7500      18569370.0               5      180.0

This StackOverflow post describes using interpolate and groupby together: https://stackoverflow.com/questions/37057187/pandas-interpolate-within-a-groupby — jsmart, Aug 10 '20 at 13:57
I tried this: df2 = df.groupby('ID').apply(lambda group: group.interpolate(method='linear')), but it didnt work — Vini, Aug 10 '20 at 15:49
the problem is that between the different ids and within an id period I have nan values - so I don't know how to use groupby. — Vini, Aug 11 '20 at 12:15

Filling NaN values of a df based on different conditions

0 Answers0