0

I am looking to derive a new row from a current row in my dataframe, and add 1 to the previous row to keep a kind of running total

df['Touch_No'] = np.where((df.Time_btween_steps.isnull()) | (df.Time_btween_steps > 30), 1, df.First_touch.shift().add(1))

I basically want to check if the column value is null, if it is then set that to "First Activity"/resets the counter, if not, add 1 to the "previous activity", to give me a running total of the number of outreach we are doing on specific people:

Expected outcome:

Time Between Steps | Touch_No
     Null.         |.   1
     0             |.   2
     5.4           |.   3
     6.7           |.   4
     2             |.   5
     null          |.   1
     1             |.   2
wjandrea
  • 28,235
  • 9
  • 60
  • 81
  • 1
    I assume you're using Pandas, so I added the [tag:pandas] tag for you. If that's incorrect, you can [edit] to fix it. BTW, welcome to Stack Overflow! Check out the [tour], and [ask] if you want tips. – wjandrea Oct 26 '21 at 17:44

3 Answers3

0

Edited according to your clarification:

df = pd.DataFrame(data=np.array(([None, 0, 5.4, 6.7, 2, None, 1],[50,1,2,3,4,35,1])).T, columns=['Time_btween_steps', 'Touch_No'])
mask = pd.isna(df['Time_btween_steps']) | df['Time_btween_steps']>30 
df['Touch_No'][~mask] += 1
df['Touch_No'][mask] = 1

Returns:

  Time_btween_steps Touch_No
0   None    51
1   0       2
2   5.4     3
3   6.7     4
4   2       5
5   None    36
6   1       2

In my opinion a solution like this is much more readable. We increment by 1 where the condition is not met, and we set the ones where the condition is true to 1. You can combine these into a single line if you wish.

Old answer for posterity.

Here is a simple solution using pandas apply functionality which takes a function.

import pandas as pd

df = pd.DataFrame(data=[1,2,3,4,None,5,0],columns=['test'])
df.test.apply(lambda x: 0 if pd.isna(x) else x+1)

Which returns:

0    2.0
1    3.0
2    4.0
3    5.0
4    0.0
5    6.0
6    1.0

Here I wrote the function in place but if you have more complicated logic, such as resetting if the number is something else, etc., you can write a custom function and pass it in instead of the lambda function. This is not the only way to do it, but if your data frame isn't huge (hundreds of thousands of rows), it should be performant. If you don't want a copy but to overwrite the array simply assign it back by prepending:

df['test'] = before the last line.

If you want the output to be ints, you can also do:

df['test'].astype(int) but be careful about converting None/Null to int.

Emir
  • 373
  • 1
  • 6
  • Thank you for the quick response. So I am trying to add 1 to the previous row in the net new column. That is why I was using the .shift() function. This seems to be adding 1 to the reference column. Let me know if that makes sense – cgphillippi Oct 26 '21 at 18:09
0

Answer using this. Combo of cumsum(), groupBy(), and cumcount()

df = pd.DataFrame(data=[None, 0, 5.4, 6.7, 2, None, 1], columns=['Time_btween_steps'])
df['Touch_No'] = np.where((df.Time_btween_steps.isnull()), (df.Time_btween_steps > 30), 1)
df['consec'] = df['Touch_No'].groupby((df['Touch_No']==0).cumsum()).cumcount()
df.head(10)
user3474165
  • 171
  • 1
  • 3
0

Using np.where, index values with ffill for partitioning and simple rank:

import numpy as np
import pandas as pd

sodf = pd.DataFrame({'time_bw_steps': [None, 0, 5.4, 6.7, 2, None, 1]})
sodf['touch_partition'] = np.where(sodf.time_bw_steps.isna(), sodf.index, np.NaN)
sodf['touch_partition'] = sodf['touch_partition'].fillna(method='ffill')
sodf['touch_no'] = sodf.groupby('touch_partition')['touch_partition'].rank(method='first', ascending='False')
sodf.drop(columns=['touch_partition'], axis='columns', inplace=True)
sodf
Gaurav
  • 1,095
  • 10
  • 19