11

I have a dataframe:

col1  col2
 a     0
 b     1
 c     1
 d     0
 c     1
 d     0

On 'col2' I want to keep only the first 1 from the top and replace every 1 below the first one with a 0, such that the output is:

col1  col2
 a     0
 b     1
 c     0
 d     0
 c     0
 d     0

Thank you very much.

timgeb
  • 76,762
  • 20
  • 123
  • 145
s900n
  • 3,115
  • 5
  • 27
  • 35

8 Answers8

10

You can find the index of the first 1 and set others to 0:

mask = df['col2'].eq(1)
df.loc[mask & (df.index != mask.idxmax()), 'col2'] = 0

For better performance, see Efficiently return the index of the first value satisfying condition in array.

jpp
  • 159,742
  • 34
  • 281
  • 339
  • Can you think of a good solution for the case when the index is arbitrary, like `Index(['u', 'v', 'w', 'x', 'y', 'z']` AND col2 could be something like `[2, 0, 0, 1, 3, 1]`? – timgeb Dec 06 '18 at 16:18
  • @timgeb, To adapt this solution, I think you can use positional indexing (instead of index labels). Something like `df.loc[mask & (np.arange(df.shape[0]) != np.where(mask)[0][0]), 'col2'] = 0`. But I'm sure there are more Pythonic ways. – jpp Dec 06 '18 at 16:28
  • Ah, I thought of using numpy, too. Just a bit differently. See my case 3. ;) – timgeb Dec 06 '18 at 16:30
4

Case 1: df has only ones and zeros in col2 and integer indexes.

>>> df
  col1  col2
0    a     0
1    b     1
2    c     1
3    d     0
4    c     1
5    d     0

You can use:

>>> df.loc[df['col2'].idxmax() + 1:, 'col2'] = 0
>>> df
  col1  col2
0    a     0
1    b     1
2    c     0
3    d     0
4    c     0
5    d     0

Case2: df can have all kinds of values in col2 and has integer indexes.

>>> df # demo dataframe
  col1  col2
0    a     0
1    b     1
2    c     2
3    d     2
4    c     3
5    d     3

You can use:

>>> df.loc[(df['col2'] == 1).idxmax() + 1:, 'col2'] = 0
>>> df
  col1  col2
0    a     0
1    b     1
2    c     0
3    d     0
4    c     0
5    d     0

Case 3: df can have all kinds of values in col2 and has an arbitrary index.

>>> df
  col1  col2
u    a    -1
v    b     1
w    c     2
x    d     2
y    c     3
z    d     3

You can use:

>>> df['col2'].iloc[(df['col2'].values == 1).argmax() + 1:] = 0
>>> df
  col1  col2
u    a    -1
v    b     1
w    c     0
x    d     0
y    c     0
z    d     0
timgeb
  • 76,762
  • 20
  • 123
  • 145
4

np.flatnonzero

Because I thought we needed more answers

df.loc[df.index[np.flatnonzero(df.col2)[1:]], 'col2'] -= 1
df

  col1  col2
0    a     0
1    b     1
2    c     0
3    d     0
4    c     0
5    d     0

Same thing but a little more sneaky.

df.col2.values[np.flatnonzero(df.col2.values)[1:]] -= 1
df

  col1  col2
0    a     0
1    b     1
2    c     0
3    d     0
4    c     0
5    d     0
piRSquared
  • 285,575
  • 57
  • 475
  • 624
3

You can use numpy for an effficient solution:

a = df.col2.values
b = np.zeros_like(a)
b[a.argmax()] = 1
df.assign(col2=b)

  col1  col2
0    a     0
1    b     1
2    c     0
3    d     0
4    c     0
5    d     0
user3483203
  • 50,081
  • 9
  • 65
  • 94
3

Using drop_duplicates with reindex

df.col2=df.col2.drop_duplicates().reindex(df.index,fill_value=0)
df
Out[1078]: 
  col1  col2
0    a     0
1    b     1
2    c     0
3    d     0
4    c     0
5    d     0
BENY
  • 317,841
  • 20
  • 164
  • 234
1

i like this too

data['col2'][np.where(data['col2'] == 1)[0][0]+1:] = 0
iamklaus
  • 3,720
  • 2
  • 12
  • 21
1

Sooo many options, here's mine... almost the same as timgebs answer (found independently), but still different ;)

Find the index of col2 that has the first occurence of a 1, and change all row values after that index to 0:

df['col2'].iloc[df.col2.idxmax()+1:] = 0
Sander van den Oord
  • 10,986
  • 5
  • 51
  • 96
0
id = list(df["col2"]).index(1)
df.iloc[id+1:]["col2"].replace(1,0,inplace=True)
shyamrag cp
  • 102
  • 9
  • 3
    While this code may answer the question, providing additional context regarding how and/or why it solves the problem would improve the answer's long-term value. – Nic3500 Dec 06 '18 at 16:00
  • Chained indexing is *not* recommended. – jpp Dec 06 '18 at 16:41