Creating a new column based on lagged/changing variable

Question

UPDATED

I have a df which looks like the following:

print(df)

           id  rank  a  
date                      
2000-01-01  1   1.0  0  
2000-01-01  2   3.0  0  
2000-01-01  3   2.0  0  
2000-01-01  4   0.0  0  
2000-01-02  1   2.0  0
2000-01-02  2   3.0  0  
2000-01-02  3   2.0  0  
2000-01-02  4   1.0  0
2000-01-03  1   3.0  0 
2000-01-03  2   2.0  0 
2000-01-03  3   1.0  0 
2000-01-03  4   1.0  0

I am wanting to create a new variable a which is equal to 1 if rankchanges from the previous month. For example, in 2000-01-01 rank is 3.0 and in 2000-01-02 it changes to 2.0. I would like this change to correspond with a 1 in a and if no change, stay 0. I want to groupby id aswell. Also, rank can only increase by 1 over each period.

Expected Output

           id  rank  a  
date                      
2000-01-01  1   1.0  1 
2000-01-01  2   3.0  1  
2000-01-01  3   2.0  1  
2000-01-01  4   0.0  1  
2000-01-02  1   2.0  1
2000-01-02  2   3.0  0  
2000-01-02  3   2.0  0  
2000-01-02  4   1.0  1
2000-01-03  1   3.0  1 
2000-01-03  2   2.0  1 
2000-01-03  3   1.0  1 
2000-01-03  4   1.0  0

Any help would be awesome!

You should probably clarify what's the expected output if the rank changes by >1 in a single step. — moonGoose, Mar 28 '19 at 07:26

jezrael · Accepted Answer · 2019-03-28T12:40:36.923

Use Series.diff with Series.abs and if possible difference more like 1 use Series.clip:

df['a'] = df['rank'].diff().abs().clip(upper=1)

If difference is always only 1:

df['a'] = df['rank'].diff().abs()

And if need first value set to 0:

df['a'] = df['rank'].diff().fillna(0).ne(0).astype(int)

EDITED answer:

df['a1'] = df.groupby('id')['rank'].diff().ne(0).astype(int)
print (df)
          date  id  rank  a  a1
0   2000-01-01   1   1.0  1   1
1   2000-01-01   2   3.0  1   1
2   2000-01-01   3   2.0  1   1
3   2000-01-01   4   0.0  1   1
4   2000-01-02   1   2.0  1   1
5   2000-01-02   2   3.0  0   0
6   2000-01-02   3   2.0  0   0
7   2000-01-02   4   1.0  1   1
8   2000-01-03   1   3.0  1   1
9   2000-01-03   2   2.0  1   1
10  2000-01-03   3   1.0  1   1
11  2000-01-03   4   1.0  0   0

DirtyBit · Answer 2 · 2019-03-28T07:30:16.357

2

Using shift:

import pandas as pd
df = pd.DataFrame({"rank": [3.0, 2.0, 2.0, 2.0, 1.0]})

df['a'] = (df['rank'] - df['rank'].shift(1)).abs()
print(df)

OUTPUT:

   rank    a
0   3.0  NaN
1   2.0  1.0
2   2.0  0.0
3   2.0  0.0
4   1.0  1.0

EDIT:

If you want to replace the NaN with empty string:

df['a'] = df['a'].fillna('').astype(str)

EDIT 2:

Using np.where, Since the above snippet would give the actual difference, where using this approach gives you 1 for a difference and 0 otherwise:

import pandas as pd
df = pd.DataFrame({"rank": [3.0, 12.0, 2.0, 2.0, 1.0]})

df['a'] = np.where((df['rank'] - df['rank'].shift(1)).abs() > 0, 1, 0)
print(df)

edited Mar 28 '19 at 07:30

answered Mar 28 '19 at 07:01

DirtyBit

16,613
4
34
55

1

This assumes rank changes by exactly +-1 or not at all – moonGoose Mar 28 '19 at 07:20
It runs, but outputs eg. 2 whereas I believe what is being requested is an indicator function for a change having occurred. – moonGoose Mar 28 '19 at 07:25

Sociopath · Answer 3 · 2019-03-28T07:15:22.607

1

You can do it using np.select

df0 = df.shift()
cond1 = df0['rank'].isnull()   # to set first row nan
cond2 = df0['rank'] != df['rank']

df['a'] = np.select([cond1,cond2], [0,1], 0)

print(df)

edited Mar 28 '19 at 07:15

answered Mar 28 '19 at 07:09

Sociopath

13,068
19
47
75

score 0 · Answer 4 · answered Mar 28 '19 at 07:02

0

d['a'] = (d['rank'].shift() == d['rank']).map({True:0, False:1})

Notice that the first element of column a is 1. Remove it if you don't need it.
Hope it will help you.

answered Mar 28 '19 at 07:02

Ting

3
2

Loochie · Answer 5 · 2019-03-28T07:48:13.490

0

Also this could be done as using Series.diff() and converting the series of booleans directly to integers as:

df['a']=df['rank'].diff().eq(-1).astype(int)

edited Mar 28 '19 at 07:48

answered Mar 28 '19 at 07:37

Loochie

2,414
13
20

Creating a new column based on lagged/changing variable

5 Answers5