Trying to figure out something complex with pandas - I have this sample dataframe:
Date Value Diff
4/2/2019 17:00 864 57
4/2/2019 17:15 864 0
4/2/2019 17:30 864 0
4/2/2019 17:45 864 0
4/2/2019 18:00 864 0
...
5/2/2019 07:00 864 0
5/2/2019 07:15 864 0
5/2/2019 07:30 864 0
5/2/2019 07:45 864 0
5/2/2019 08:00 864 0
5/2/2019 08:15 864 0
5/2/2019 08:30 1564 700
5/2/2019 08:45 1784 223
5/2/2019 09:00 1904 120
5/2/2019 09:15 2095 191
5/2/2019 09:30 2095 183
5/2/2019 09:45 2095 85
5/2/2019 10:00 2095 58
5/2/2019 10:15 2095 134
5/2/2019 10:30 2555 78
5/2/2019 10:45 2678 123
5/2/2019 11:00 2777 99
The expected dataframe is this:
Date Value Diff NewCol1
4/2/2019 17:00 864 57 57
4/2/2019 17:15 864 0 63.63
4/2/2019 17:30 864 0 63.63
4/2/2019 17:45 864 0 63.63
4/2/2019 18:00 864 0 63.63
...
5/2/2019 07:00 864 0 63.63
5/2/2019 07:15 864 0 63.63
5/2/2019 07:30 864 0 63.63
5/2/2019 07:45 864 0 63.63
5/2/2019 08:00 864 0 63.63
5/2/2019 08:15 864 0 63.63
5/2/2019 08:30 1564 700 63.63
5/2/2019 08:45 1784 223 223
5/2/2019 09:00 1904 120 120
5/2/2019 09:15 2095 191 191
5/2/2019 09:30 2095 183 183
5/2/2019 09:45 2095 85 85
5/2/2019 10:00 2095 58 58
5/2/2019 10:15 2095 134 134
5/2/2019 10:30 2555 78 78
5/2/2019 10:45 2678 123 123
5/2/2019 11:00 2777 99 99
...
There are two conditions to create the NewCol1
,
When
Value
is repeated, ifDiff
is 0, and only between time range from 07:00 to 18:00, count the number of 0 + 1 and the difference of theValue
and divide it (in this example,(1564 - 864) / 11
). Take note that the+ 1
is to account the row after the last0
, which is the row at08:30
and this sort of occurrence will always happen consecutively such as this example patern (example4/2/2019 17:15 to 5/2/2019 08:30
)If
Value
is repeated, andDiff
is more than 0, copy theDiff
in that row toNewCol1
including the row after the last repeatedValue
withDiff
more than 0 (example5/2/2019 09:45 to 5/2/2019 10:30
)
Is pandas able to fully do the above conditions to this dataframe? If not, what other Python way can I try? Prefer to avoid if else
as efficiently is important since there will be hundred thousands of rows.