0

I have a df like this:

Timestamp                                 Time  Power    Total Energy              ID     Energy
2020-04-09 06:45:00 2020-04-09 04:44:40.559719   7500       5636690.0               1      140.0    
2020-04-09 06:46:00 2020-04-09 04:44:40.559719   7500       5636710.0               1      160.0    
2020-04-09 06:47:00                        NaT    NaN             NaN             NaN        NaN    
2020-04-09 06:48:00 2020-04-09 04:44:40.559719   7500       5636960.0               1      410.0
2020-04-09 06:49:00                        NaT    NaN             NaN             NaN        NaN
2020-04-09 06:50:00                        NaT    NaN             NaN             NaN        NaN
2020-04-09 06:51:00                        NaT    NaN             NaN             NaN        NaN
...                                        ...    ...             ...             ...        ...
2020-04-30 23:55:00 2020-04-29 16:30:38.559871   7500      18569270.0               5      100.0
2020-04-30 23:54:00                        NaT    NaN             NaN             NaN        NaN
2020-04-30 23:55:00 2020-04-29 16:30:38.559871   7500      18569370.0               5      180.0

different cycles (df['ID']) are marked by different IDs. Within a cycle (ID appears both before and after the nan value) the power of the two "surrounding" lines should be averaged, ID and Time should be continued and in the column energy the last existing value of the column energy should be entered. Outside the cycle (ID before != next ID) the power as well as the energy should be set to 0 and the columns ID/Time should be set to '-'. for the column total energy the values should simply be continued.

Expected outcome:

Timestamp                                 Time  Power    Total Energy              ID     Energy
2020-04-09 06:45:00 2020-04-09 04:44:40.559719   7500       5636690.0               1      140.0    
2020-04-09 06:46:00 2020-04-09 04:44:40.559719   7500       5636710.0               1      160.0    
2020-04-09 06:47:00 2020-04-09 04:44:40.559719   7500       5636710.0               1      160.0
2020-04-09 06:48:00 2020-04-09 04:44:40.559719   7500       5636960.0               1      410.0
2020-04-09 06:49:00                          -      0       5636960.0               -          0
2020-04-09 06:50:00                          -      0       5636960.0               -          0
2020-04-09 06:51:00                          -      0       5636960.0               -          0
...                                        ...    ...             ...             ...        ...
2020-04-30 23:55:00 2020-04-29 16:30:38.559871   7500      18569270.0               5      100.0
2020-04-30 23:54:00 2020-04-29 16:30:38.559871   7500      18569270.0               5      100.0
2020-04-30 23:55:00 2020-04-29 16:30:38.559871   7500      18569370.0               5      180.0
Vini
  • 125
  • 1
  • 8
  • Try with `df.ffill()` or `df.bfill()`. – ipj Aug 10 '20 at 13:50
  • This StackOverflow post describes using interpolate and groupby together: https://stackoverflow.com/questions/37057187/pandas-interpolate-within-a-groupby – jsmart Aug 10 '20 at 13:57
  • I tried this: df2 = df.groupby('ID').apply(lambda group: group.interpolate(method='linear')), but it didnt work – Vini Aug 10 '20 at 15:49
  • the problem is that between the different ids and within an id period I have nan values - so I don't know how to use groupby. – Vini Aug 11 '20 at 12:15

0 Answers0