1

I have a data frame as shown below

B_ID    Session    no_show   cumulative_no_show
1       s1         0.4       0.4
2       s1         0.6       1.0
3       s1         0.2       1.2
4       s1         0.1       1.3
5       s1         0.4       1.7
6       s1         0.2       1.9
7       s1         0.3       2.2
10      s2         0.3       0.3
11      s2         0.4       0.7
12      s2         0.3       1.0
13      s2         0.6       1.6
14      s2         0.2       1.8
15      s2         0.5       2.3

where cumulative_no_show is the cumulative sum of no_show.

From the above I would like to create a new column called u_no_show based on below condition.

Whenever cumulative_no_show >= 0.8, then subtract 1 from next cumulative_no_show. and so on.

Expected Output:

B_ID    Session    no_show   cumulative_no_show   u_no_show
1       s1         0.4       0.4                  0.4
2       s1         0.6       1.0                  1.0
3       s1         0.2       1.2                  0.2
4       s1         0.1       1.3                  0.3
5       s1         0.4       1.7                  0.7
6       s1         0.2       1.9                  0.9
7       s1         0.3       2.2                  0.2
10      s2         0.3       0.3                  0.3
11      s2         0.4       0.7                  0.7
12      s2         0.3       1.0                  1.0
13      s2         0.6       1.6                  0.6
14      s2         0.2       1.8                  1.8
15      s2         0.5       2.3                  0.3
Danish
  • 2,719
  • 17
  • 32

1 Answers1

1

I assume you want to perform this per Session. I'm not sure there is a vectorized solution so I would go by creating a function that iterate over the values and do the subtraction when needed, then use groupby.apply:

def create_u_no_show (ser):
    # convert to numpy aray and iterate
    arr_ns = ser.to_numpy()
    for i in range(len(arr_ns)-1):
        # check if the condition is met
        if arr_ns[i]>0.8:
            # remove 1 to all the next values if the condition is met
            arr_ns[i+1:] -= 1
    # return a serie with the right index
    return pd.Series(arr_ns, ser.index)

df['u_no_show'] = df.groupby(['Session'])['cumulative_no_show'].apply(create_u_no_show)

print (df) 
    B_ID Session  no_show  cumulative_no_show  u_no_show
0      1      s1      0.4                 0.4        0.4
1      2      s1      0.6                 1.0        1.0
2      3      s1      0.2                 1.2        0.2
3      4      s1      0.1                 1.3        0.3
4      5      s1      0.4                 1.7        0.7
5      6      s1      0.2                 1.9        0.9
6      7      s1      0.3                 2.2        0.2
7     10      s2      0.3                 0.3        0.3
8     11      s2      0.4                 0.7        0.7
9     12      s2      0.3                 1.0        1.0
10    13      s2      0.6                 1.6        0.6
11    14      s2      0.2                 1.8        0.8
12    15      s2      0.5                 2.3        1.3
Ben.T
  • 29,160
  • 6
  • 32
  • 54
  • if possible pls look into this question... It is similar question whenever we subtract -1, dont change slot number.. https://stackoverflow.com/questions/61342916/groupby-cumulative-sum-in-pandas-based-on-specific-condition – Danish Apr 22 '20 at 13:00
  • added similar question. if possible pls look into this https://stackoverflow.com/questions/61441460/create-new-rows-based-the-values-of-one-of-the-column-in-pandas – Danish Apr 26 '20 at 13:25
  • added new question, pls help me to solve during your free time https://stackoverflow.com/questions/63236080/replace-the-column-values-based-on-the-list-of-dictionary-and-specific-date-cond – Danish Aug 03 '20 at 19:58