3

I'm trying to iterate through a DataFrame and when a value changes, increment a counter, then set a new column equal to that value. I'm able to get this to work using a global counter, like so:

def change_ind(row):
    global prev_row
    global k

    if row['rep'] != prev_row:
        k = k+1
        prev_row = row['rep']
    return k

But when I try to pass arguments to the apply function, as below, it no longer works. It seems like it is resetting the values of k, prev_row each time it operates on a new row. Is there a way to pass arguments to the function and get the result I'm looking for? Or a better way to do this altogether?

def change_ind(row, k, prev_row):    
    if row != prev_row:
        k = k+1
        prev_row = row
    return k
IanS
  • 15,771
  • 9
  • 60
  • 84
  • 2
    IIUC you can do the same doing `df['rep'] = (df['rep'] != df['rep'].shift()).cumsum()` – EdChum Jul 08 '16 at 12:54
  • That's my understanding too, @EdChum should put that in an answer. Also as a side note, for future reference, you can use `k += 1` to add to a counter. – Jeff Jul 08 '16 at 12:59

1 Answers1

3

You can achieve the same thing using shift and cumsum this will be significantly faster than looping:

In [107]:
df = pd.DataFrame({'rep':[0,1,1,1,2,3,2,3,4,5,1]})
df

Out[107]:
    rep
0     0
1     1
2     1
3     1
4     2
5     3
6     2
7     3
8     4
9     5
10    1

In [108]:    
df['rep_f'] = (df['rep']!=df['rep'].shift()).cumsum()-1
df

Out[108]:
    rep  rep_f
0     0      0
1     1      1
2     1      1
3     1      1
4     2      2
5     3      3
6     2      4
7     3      5
8     4      6
9     5      7
10    1      8
EdChum
  • 376,765
  • 198
  • 813
  • 562
  • Interesting! I'll definitely use this. I'm using quite a few of these functions, so I'm still curious if there's a way to pass non global variables to an apply function and not have them overwritten each iteration. I'll have to think about how I can use shift here in the future, though. – Chris Hedenberg Jul 08 '16 at 13:59
  • presumably if you declare `k` outside of your func then it should update no? `k = 0 prev_row = 0 def change_ind(row): if row != prev_row: k = k+1 prev_row = row return k` – EdChum Jul 08 '16 at 14:01
  • Yeah, it does. It gets messy to have to have to re declare these every time I call a function, so was hoping I could pass it through the apply function. – Chris Hedenberg Jul 08 '16 at 14:09