0

Trying to figure out something complex with pandas - I have this sample dataframe:

Date             Value      Diff
4/2/2019 17:00   864        57      
4/2/2019 17:15   864        0
4/2/2019 17:30   864        0
4/2/2019 17:45   864        0
4/2/2019 18:00   864        0
...
5/2/2019 07:00   864        0
5/2/2019 07:15   864        0
5/2/2019 07:30   864        0
5/2/2019 07:45   864        0
5/2/2019 08:00   864        0
5/2/2019 08:15   864        0
5/2/2019 08:30   1564       700
5/2/2019 08:45   1784       223
5/2/2019 09:00   1904       120
5/2/2019 09:15   2095       191
5/2/2019 09:30   2095       183
5/2/2019 09:45   2095       85
5/2/2019 10:00   2095       58
5/2/2019 10:15   2095       134
5/2/2019 10:30   2555       78
5/2/2019 10:45   2678       123
5/2/2019 11:00   2777       99

The expected dataframe is this:

Date             Value      Diff      NewCol1
4/2/2019 17:00   864        57        57
4/2/2019 17:15   864        0         63.63
4/2/2019 17:30   864        0         63.63
4/2/2019 17:45   864        0         63.63
4/2/2019 18:00   864        0         63.63
...
5/2/2019 07:00   864        0         63.63
5/2/2019 07:15   864        0         63.63
5/2/2019 07:30   864        0         63.63
5/2/2019 07:45   864        0         63.63
5/2/2019 08:00   864        0         63.63
5/2/2019 08:15   864        0         63.63
5/2/2019 08:30   1564       700       63.63
5/2/2019 08:45   1784       223       223
5/2/2019 09:00   1904       120       120
5/2/2019 09:15   2095       191       191
5/2/2019 09:30   2095       183       183
5/2/2019 09:45   2095       85        85
5/2/2019 10:00   2095       58        58
5/2/2019 10:15   2095       134       134
5/2/2019 10:30   2555       78        78
5/2/2019 10:45   2678       123       123
5/2/2019 11:00   2777       99        99
...

There are two conditions to create the NewCol1,

  1. When Value is repeated, if Diff is 0, and only between time range from 07:00 to 18:00, count the number of 0 + 1 and the difference of the Value and divide it (in this example, (1564 - 864) / 11). Take note that the + 1 is to account the row after the last 0, which is the row at 08:30 and this sort of occurrence will always happen consecutively such as this example patern (example 4/2/2019 17:15 to 5/2/2019 08:30)

  2. If Value is repeated, and Diff is more than 0, copy the Diff in that row to NewCol1 including the row after the last repeated Value with Diff more than 0 (example 5/2/2019 09:45 to 5/2/2019 10:30)

Is pandas able to fully do the above conditions to this dataframe? If not, what other Python way can I try? Prefer to avoid if else as efficiently is important since there will be hundred thousands of rows.

SunnyBoiz
  • 514
  • 1
  • 5
  • 14
  • Could you improve your question by following [this guide](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) to provide samples? Also, if `Value` is not repeated what the code is supposed to do, just copy `Diff`? And what if `Diff` is 0? – xicocaio Mar 30 '21 at 17:12

1 Answers1

1

Add column:

df['NewCol1'] = df['Diff']

I know, cause optimization You want to avoid else if - but currently that solution can work:

between_ time - https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.between_time.html

apply - https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.apply.html

def convertValues(row):
    if (row['Diff'] > 0):
       return row['Diff']
    if (row['Diff'] == 0 and row['Date'].between_time('7:00','18:00'):
       return row['Value'] - int(row['Value'].name-1) / row.name
   
transformedDf = df.apply(lambda x: convertValues(x), axis=1)
Piotr Żak
  • 2,046
  • 5
  • 18
  • 30