2

I have got trouble to make to compute the number of days in a row until a condition is found. It is given in the following table were Gap done is the messy table I obtained with the solution form there , and Expected gap the output I want to obtain.

+--------+------------+---------------------+----------+----------------------------------------------------------------------------------------------+
| Player |   Result   |        Date         | Gap done |                                         Expected Gap                                         |
+--------+------------+---------------------+----------+----------------------------------------------------------------------------------------------+
| K2000  | Lose       | 2015-11-13 13:42:00 | Nan      | Nan/0                                                                                        |
| K2000  | Lose       | 2016-03-23 16:40:00 | 131.0    | 131.0                                                                                        |
| K2000  | Lose       | 2016-05-16 19:17:00 | 54.0     | 185.0                                                                                        |
| K2000  | Win        | 2016-06-09 19:36:00 | 54.0     | 239.0 #he always lose before                                                                 |
| K2000  | Win        | 2016-06-30 14:05:00 | 54.0     | 54.0 #because he won last time, it's 54 days btw this current date and the last time he won. |
| K2000  | Lose       | 2016-07-29 16:20:00 | 29.0     | 29.0                                                                                         |
| K2000  | Win        | 2016-10-08 17:48:00 | 29.0     | 58.0                                                                                         |
| Kssis  | Lose       | 2007-02-25 15:05:00 | Nan      | Nan/0                                                                                        |
| Kssis  | Lose       | 2007-04-25 16:07:00 | 59.0     | 59.0                                                                                         |
| Kssis  | Not ranked | 2007-06-01 16:54:00 | 37.0     | 96.0                                                                                         |
| Kssis  | Lose       | 2007-09-09 14:33:00 | 99.0     | 195.0                                                                                        |
| Kssis  | Lose       | 2008-04-06 16:27:00 | 210.0    | 405.0                                                                                        |
+--------+------------+---------------------+----------+----------------------------------------------------------------------------------------------+

The issue of the solution there is it does not really compute date. It has the chance that date in this example are always separate by 1 day.

Sure I adapted with

def sum_days_in_row_with_condition(g):
    sorted_g = g.sort_values(by='date', ascending=True)
    condition = sorted_g['Result'] == 'Win'
    sorted_g['days-in-a-row'] = g.date.diff().dt.days.where(~condition).ffill()
    return sorted_g

But as I showed you, this is messy.

So I thought about a solution, but it needs global variables (out of function), and that's a little fastidious.

Can anyone help to solve this problematic in a simpler way ?


Pandas version: 0.23.4 Python version: 3.7.4

AvyWam
  • 890
  • 8
  • 28

1 Answers1

1

IIUC, you need to find the boolean mask m1 where win has previous row also win. From m1 create a groupID s to separate group win. Split them into group and cumsum

m = df.Result.eq('Win')
m1 = m & m.shift()
s = m1.ne(m1.shift()).cumsum()
df['Expected Gap'] = df.groupby(['Player', s])['Gap done'].cumsum()

Out[808]:
   Player      Result                 Date  Gap done  Expected Gap
0   K2000        Lose  2015-11-13 13:42:00      NaN           NaN
1   K2000        Lose  2016-03-23 16:40:00    131.0         131.0
2   K2000        Lose  2016-05-16 19:17:00     54.0         185.0
3   K2000         Win  2016-06-09 19:36:00     54.0         239.0
4   K2000         Win  2016-06-30 14:05:00     54.0          54.0
5   K2000        Lose  2016-07-29 16:20:00     29.0          29.0
6   K2000         Win  2016-10-08 17:48:00     29.0          58.0
7   Kssis        Lose  2007-02-25 15:05:00      NaN           NaN
8   Kssis        Lose   2007-04-25 6:07:00     59.0          59.0
9   Kssis  Not-ranked  2007-06-01 16:54:00     37.0          96.0
10  Kssis        Lose  2007-09-09 14:33:00     99.0         195.0
11  Kssis        Lose  2008-04-06 16:27:00    210.0         405.0
Andy L.
  • 24,909
  • 4
  • 17
  • 29
  • 1
    Thank you a lot. I want to add something. The second and third lines gave unexpected results. I am not meaning that your computation was wrong with my example shown in my topic. I mean it gave unexpected results with the real data I use. So, I replaced by only two lines: `m = df.Result.eq('Win')` and `s = m.shift().cumsum()`. Then, to be honest, there is some weird results like a `-1` at the first row for the column `Expected Gap`, but it is correctable. Well, it was a great track anyway, thanks again. – AvyWam Nov 19 '19 at 10:29
  • you are welcome. It's interesting. It's great you figure it out :) – Andy L. Nov 19 '19 at 17:11