Pandas dataframe: Remove secondary upcoming same value

Question

I have a dataframe:

On 'col2' I want to keep only the first 1 from the top and replace every 1 below the first one with a 0, such that the output is:

Thank you very much.

score 10 · Accepted Answer · answered Dec 06 '18 at 15:37

10

You can find the index of the first 1 and set others to 0:

mask = df['col2'].eq(1)
df.loc[mask & (df.index != mask.idxmax()), 'col2'] = 0

For better performance, see Efficiently return the index of the first value satisfying condition in array.

answered Dec 06 '18 at 15:37

jpp

159,742
34
281
339

Can you think of a good solution for the case when the index is arbitrary, like `Index(['u', 'v', 'w', 'x', 'y', 'z']` AND col2 could be something like `[2, 0, 0, 1, 3, 1]`? – timgeb Dec 06 '18 at 16:18
@timgeb, To adapt this solution, I think you can use positional indexing (instead of index labels). Something like `df.loc[mask & (np.arange(df.shape[0]) != np.where(mask)[0][0]), 'col2'] = 0`. But I'm sure there are more Pythonic ways. – jpp Dec 06 '18 at 16:28
Ah, I thought of using numpy, too. Just a bit differently. See my case 3. ;) – timgeb Dec 06 '18 at 16:30

timgeb · Answer 2 · 2018-12-06T16:29:49.800

Case 1: df has only ones and zeros in col2 and integer indexes.

>>> df
  col1  col2
0    a     0
1    b     1
2    c     1
3    d     0
4    c     1
5    d     0

You can use:

>>> df.loc[df['col2'].idxmax() + 1:, 'col2'] = 0
>>> df
  col1  col2
0    a     0
1    b     1
2    c     0
3    d     0
4    c     0
5    d     0

Case2: df can have all kinds of values in col2 and has integer indexes.

>>> df # demo dataframe
  col1  col2
0    a     0
1    b     1
2    c     2
3    d     2
4    c     3
5    d     3

You can use:

>>> df.loc[(df['col2'] == 1).idxmax() + 1:, 'col2'] = 0
>>> df
  col1  col2
0    a     0
1    b     1
2    c     0
3    d     0
4    c     0
5    d     0

Case 3: df can have all kinds of values in col2 and has an arbitrary index.

>>> df
  col1  col2
u    a    -1
v    b     1
w    c     2
x    d     2
y    c     3
z    d     3

You can use:

>>> df['col2'].iloc[(df['col2'].values == 1).argmax() + 1:] = 0
>>> df
  col1  col2
u    a    -1
v    b     1
w    c     0
x    d     0
y    c     0
z    d     0

piRSquared · Answer 3 · 2018-12-06T15:56:32.060

4

`np.flatnonzero`

Because I thought we needed more answers

df.loc[df.index[np.flatnonzero(df.col2)[1:]], 'col2'] -= 1
df

  col1  col2
0    a     0
1    b     1
2    c     0
3    d     0
4    c     0
5    d     0

Same thing but a little more sneaky.

df.col2.values[np.flatnonzero(df.col2.values)[1:]] -= 1
df

  col1  col2
0    a     0
1    b     1
2    c     0
3    d     0
4    c     0
5    d     0

edited Dec 06 '18 at 15:56

answered Dec 06 '18 at 15:51

piRSquared

285,575
57
475
624

user3483203 · Answer 4 · 2018-12-06T15:53:04.653

3

You can use numpy for an effficient solution:

a = df.col2.values
b = np.zeros_like(a)
b[a.argmax()] = 1
df.assign(col2=b)

  col1  col2
0    a     0
1    b     1
2    c     0
3    d     0
4    c     0
5    d     0

edited Dec 06 '18 at 15:53

answered Dec 06 '18 at 15:39

user3483203

50,081
9
65
94

score 3 · Answer 5 · answered Dec 06 '18 at 15:41

3

Using drop_duplicates with reindex

df.col2=df.col2.drop_duplicates().reindex(df.index,fill_value=0)
df
Out[1078]: 
  col1  col2
0    a     0
1    b     1
2    c     0
3    d     0
4    c     0
5    d     0

answered Dec 06 '18 at 15:41

BENY

317,841
20
164
234

score 1 · Answer 6 · answered Dec 06 '18 at 15:42

1

i like this too

data['col2'][np.where(data['col2'] == 1)[0][0]+1:] = 0

answered Dec 06 '18 at 15:42

iamklaus

3,720
2
12
21

1

Chained indexing is *not* recommended. – jpp Dec 06 '18 at 16:41

score 1 · Answer 7 · answered Dec 06 '18 at 15:55

1

Sooo many options, here's mine... almost the same as timgebs answer (found independently), but still different ;)

Find the index of col2 that has the first occurence of a 1, and change all row values after that index to 0:

df['col2'].iloc[df.col2.idxmax()+1:] = 0

answered Dec 06 '18 at 15:55

Sander van den Oord

10,986
5
51
96

Be careful, this sets *all* values to `0` after the specified index, not just the ones equal to `1`. Though that's the same with some other answers too. – jpp Dec 06 '18 at 16:42
Totally agree. Your solution is more general. – Sander van den Oord Dec 06 '18 at 17:42

score 0 · Answer 8 · answered Dec 06 '18 at 15:43

0

id = list(df["col2"]).index(1)
df.iloc[id+1:]["col2"].replace(1,0,inplace=True)

answered Dec 06 '18 at 15:43

shyamrag cp

102
9

3

While this code may answer the question, providing additional context regarding how and/or why it solves the problem would improve the answer's long-term value. – Nic3500 Dec 06 '18 at 16:00
Chained indexing is *not* recommended. – jpp Dec 06 '18 at 16:41

Pandas dataframe: Remove secondary upcoming same value

8 Answers8

np.flatnonzero

`np.flatnonzero`