How to change consecutive repeating values in pandas dataframe series to nan or 0?

Question

I have a pandas dataframe created from measured numbers. When something goes wrong with the measurement, the last value is repeated. I would like to do two things:
1. Change all repeating values either to nan or 0.
2. Keep the first repeating value and change all other values nan or 0.

I have found solutions using "shift" but they drop repeating values. I do not want to drop repeating values.My data frame looks like this:

df = pd.DataFrame(np.random.randn(15, 3))
df.iloc[4:8,0]=40
df.iloc[12:15,1]=22
df.iloc[10:12,2]=0.23

giving a dataframe like this:

            0          1         2
0    1.239916   1.109434  0.305490
1    0.248682   1.472628  0.630074
2   -0.028584  -1.116208  0.074299
3   -0.784692  -0.774261 -1.117499
4   40.000000   0.283084 -1.495734
5   40.000000  -0.074763 -0.840403
6   40.000000   0.709794 -1.000048
7   40.000000   0.920943  0.681230
8   -0.701831   0.547689 -0.128996
9   -0.455691   0.610016  0.420240
10  -0.856768  -1.039719  0.230000
11   1.187208   0.964340  0.230000
12   0.116258  22.000000  1.119744
13  -0.501180  22.000000  0.558941
14   0.551586  22.000000 -0.993749

what I would like to be able to do is write some code that would filter the data and give me a data frame like this:

           0         1         2
0   1.239916  1.109434  0.305490
1   0.248682  1.472628  0.630074
2  -0.028584 -1.116208  0.074299
3  -0.784692 -0.774261 -1.117499
4        NaN  0.283084 -1.495734
5        NaN -0.074763 -0.840403
6        NaN  0.709794 -1.000048
7        NaN  0.920943  0.681230
8  -0.701831  0.547689 -0.128996
9  -0.455691  0.610016  0.420240
10 -0.856768 -1.039719       NaN
11  1.187208  0.964340       NaN
12  0.116258       NaN  1.119744
13 -0.501180       NaN  0.558941
14  0.551586       NaN -0.993749

or even better keep the first value and change the rest to NaN. Like this:

            0          1         2
0    1.239916   1.109434  0.305490
1    0.248682   1.472628  0.630074
2   -0.028584  -1.116208  0.074299
3   -0.784692  -0.774261 -1.117499
4   40.000000   0.283084 -1.495734
5         NaN  -0.074763 -0.840403
6         NaN   0.709794 -1.000048
7         NaN   0.920943  0.681230
8   -0.701831   0.547689 -0.128996
9   -0.455691   0.610016  0.420240
10  -0.856768  -1.039719  0.230000
11   1.187208   0.964340       NaN
12   0.116258  22.000000  1.119744
13  -0.501180        NaN  0.558941
14   0.551586        NaN -0.993749

could you share your attempt with the shift function that did not work for you. my suggestion involves the shift function, and produces output exactly as your desired output. I'm curious to see what you tried with the shift function. — Haleemur Ali, Jan 24 '18 at 17:51
I was looking at this solution that used the "shift" command: — getaglow, Jan 24 '18 at 20:05
https://stackoverflow.com/questions/19463985/pandas-drop-consecutive-duplicates — getaglow, Jan 24 '18 at 20:05

score 4 · Accepted Answer · answered Jan 24 '18 at 17:44

using shift & mask:

df.shift(1) == df compares the next row to the current for consecutive duplicates.

df.mask(df.shift(1) == df)

# outputs    
            0          1         2
0    0.365329   0.153527  0.143244
1    0.688364   0.495755  1.065965
2    0.354180  -0.023518  3.338483
3   -0.106851   0.296802 -0.594785
4   40.000000   0.149378  1.507316
5         NaN  -1.312952  0.225137
6         NaN  -0.242527 -1.731890
7         NaN   0.798908  0.654434
8    2.226980  -1.117809 -1.172430
9   -1.228234  -3.129854 -1.101965
10   0.393293   1.682098  0.230000
11  -0.029907  -0.502333       NaN
12   0.107994  22.000000  0.354902
13  -0.478481        NaN  0.531017
14  -1.517769        NaN  1.552974

if you want to remove all the consecutive duplicates, test that the previous row is also the same as the current row

df.mask((df.shift(1) == df) | (df.shift(-1) == df))

I can't thank you enough!!! You are going to make me look good. Thank you. — getaglow, Jan 24 '18 at 20:11
honestly, this is one of those answers that has so many applications that this immediately goes into my pandas toolbelt. the `mask` and `shift` combo is so simple - yet so powerful. — MattR, Jan 25 '18 at 14:54

piRSquared · Answer 2 · 2018-01-24T17:50:26.090

2

Option 1
Specialized solution using diff. Get's at the final desired output.

df.mask(df.diff().eq(0))

            0          1         2
0    1.239916   1.109434  0.305490
1    0.248682   1.472628  0.630074
2   -0.028584  -1.116208  0.074299
3   -0.784692  -0.774261 -1.117499
4   40.000000   0.283084 -1.495734
5         NaN  -0.074763 -0.840403
6         NaN   0.709794 -1.000048
7         NaN   0.920943  0.681230
8   -0.701831   0.547689 -0.128996
9   -0.455691   0.610016  0.420240
10  -0.856768  -1.039719  0.230000
11   1.187208   0.964340       NaN
12   0.116258  22.000000  1.119744
13  -0.501180        NaN  0.558941
14   0.551586        NaN -0.993749

edited Jan 24 '18 at 17:50

answered Jan 24 '18 at 17:20

piRSquared

285,575
57
475
624

1

`pd.Series.duplicated` would pick up other series elements that appear more than once, even if they are not consecutive. I'm not sure that is what the OP is looking for, as he mentions repeating values. – Haleemur Ali Jan 24 '18 at 17:49
@HaleemurAli fixed – piRSquared Jan 24 '18 at 17:52

How to change consecutive repeating values in pandas dataframe series to nan or 0?

2 Answers2

Linked