Set value of first item in slice in python pandas

Question

So I would like make a slice of a dataframe and then set the value of the first item in that slice without copying the dataframe. For example:

df = pandas.DataFrame(numpy.random.rand(3,1))
df[df[0]>0][0] = 0

The slice here is irrelevant and just for the example and will return the whole data frame again. Point being, by doing it like it is in the example you get a setting with copy warning (understandably). I have also tried slicing first and then using ILOC/IX/LOC and using ILOC twice, i.e. something like:

df.iloc[df[0]>0,:][0] = 0
df[df[0]>0,:].iloc[0] = 0

And neither of these work. Again- I don't want to make a copy of the dataframe even if it id just the sliced version.

EDIT: It seems there are two ways, using a mask or IdxMax. The IdxMax method seems to work if your index is unique, and the mask method if not. In my case, the index is not unique which I forgot to mention in the initial post.

jezrael · Answer 1 · 2017-03-03T06:43:10.427

12

I think you can use idxmax for get index of first True value and then set by loc:

np.random.seed(1)
df = pd.DataFrame(np.random.randint(4, size=(5,1)))
print (df)
   0
0  1
1  3
2  0
3  0
4  3

print ((df[0] == 0).idxmax())
2

df.loc[(df[0] == 0).idxmax(), 0] = 100
print (df)
     0
0    1
1    3
2  100
3    0
4    3

df.loc[(df[0] == 3).idxmax(), 0] = 200
print (df)
     0
0    1
1  200
2    0
3    0
4    3

EDIT:

Solution with not unique index:

np.random.seed(1)
df = pd.DataFrame(np.random.randint(4, size=(5,1)), index=[1,2,2,3,4])
print (df)
   0
1  1
2  3
2  0
3  0
4  3

df = df.reset_index()
df.loc[(df[0] == 3).idxmax(), 0] = 200
df = df.set_index('index')
df.index.name = None
print (df)
     0
1    1
2  200
2    0
3    0
4    3

EDIT1:

Solution with MultiIndex:

np.random.seed(1)
df = pd.DataFrame(np.random.randint(4, size=(5,1)), index=[1,2,2,3,4])
print (df)
   0
1  1
2  3
2  0
3  0
4  3

df.index = [np.arange(len(df.index)), df.index]
print (df)
     0
0 1  1
1 2  3
2 2  0
3 3  0
4 4  3

df.loc[(df[0] == 3).idxmax(), 0] = 200
df = df.reset_index(level=0, drop=True)

print (df)
     0
1    1
2  200
2    0
3    0
4    3

EDIT2:

Solution with double cumsum:

np.random.seed(1)
df = pd.DataFrame([4,0,4,7,4], index=[1,2,2,3,4])
print (df)
   0
1  4
2  0
2  4
3  7
4  4

mask = (df[0] == 0).cumsum().cumsum()
print (mask)
1    0
2    1
2    2
3    3
4    4
Name: 0, dtype: int32

df.loc[mask == 1, 0] = 200
print (df)
     0
1    4
2  200
2    4
3    7
4    4

edited Mar 03 '17 at 06:43

answered Feb 28 '17 at 18:30

jezrael

822,522
95
1,334
1,252

Will this work if the second 7 is not directly after the first, i.e., if the result of `cumsum` on the boolean array would have several `1`s? – juanpa.arrivillaga Feb 28 '17 at 18:34
@juanpa.arrivillaga - thanks, you are right. give me a sec – jezrael Feb 28 '17 at 18:38
I think the most reliable way, and this is only reliable if your index is unique, is to get the index from the slice, then get the first value form the index, and set using that value on the original frame. – juanpa.arrivillaga Feb 28 '17 at 18:38
Ah, `idxmax`, very very clever! – juanpa.arrivillaga Feb 28 '17 at 18:42
@RexFuzzle - Yes, it works if not consecutive values also, see second solution with different condition. – jezrael Feb 28 '17 at 18:46
Very cool, unfortunately, my index is not unique :( Amended question. Seem the mask method may be the only one in my case. Thank you @jezrael – RexFuzzle Feb 28 '17 at 18:49
I also can't reset the index as I need it back- suppose I could save it and put it back again. – RexFuzzle Feb 28 '17 at 18:59
Why cant be used `reset_index` and `set_index` ? – jezrael Feb 28 '17 at 19:00
I add another solution with `MultiIndex`. – jezrael Feb 28 '17 at 19:05
For non-unique index `iloc` and `argmax` might be an alternative. – ayhan Feb 28 '17 at 19:15

score 1 · Answer 2 · answered Mar 05 '17 at 22:10

1

Consider the dataframe df

df = pd.DataFrame(dict(A=[1, 2, 3, 4, 5]))

print(df)

   A
0  1
1  2
2  3
3  4
4  5

Create some arbitrary slice slc

slc = df[df.A > 2]

print(slc)

   A
2  3
3  4
4  5

Access the first row of slc within df by using index[0] and loc

df.loc[slc.index[0]] = 0
print(df)

   A
0  1
1  2
2  0
3  4
4  5

answered Mar 05 '17 at 22:10

piRSquared

285,575
57
475
624

I was hoping to not duplicate any part of the df as it is large and even the slice could be quite big. – RexFuzzle Mar 06 '17 at 06:13
@RexFuzzle you said the slice was arbitrary and I'm assuming already exists. From that slice, I'm grabbing the first index value and using that to modify the original `df`. – piRSquared Mar 06 '17 at 06:18
I think something like `df.loc[slice, another_slice]` should be less memory intensive than `df.loc[slice].loc[:, another_slice]`. This is possible for row and column slicing at the same time but it appears it is not possible to do it row-wise with different conditions. I am not sure actually, maybe what I have in mind doesn't make sense. – ayhan Mar 09 '17 at 17:39

score 1 · Answer 3 · answered Mar 09 '17 at 17:22

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(6,1),index=[1,2,2,3,3,3])
df[1] = 0
df.columns=['a','b']
df['b'][df['a']>=0.5]=1
df=df.sort(['b','a'],ascending=[0,1])
df.loc[df[df['b']==0].index.tolist()[0],'a']=0

In this method extra copy of the dataframe is not created but an extra column is introduced which can be dropped after processing. To choose any index instead o the first one you can change the last line as follows

df.loc[df[df['b']==0].index.tolist()[n],'a']=0

to change any nth item in a slice

df

          a  
1  0.111089  
2  0.255633  
2  0.332682  
3  0.434527  
3  0.730548  
3  0.844724

df after slicing and labelling them

          a  b
1  0.111089  0
2  0.255633  0
2  0.332682  0
3  0.434527  0
3  0.730548  1
3  0.844724  1

After changing value of first item in slice (labelled as 0) to 0

          a  b
3  0.730548  1
3  0.844724  1
1  0.000000  0
2  0.255633  0
2  0.332682  0
3  0.434527  0

score 0 · Accepted Answer · answered Mar 27 '17 at 18:37

So using some of the answers I managed to find a one liner way to do this:

np.random.seed(1)
df = pd.DataFrame(np.random.randint(4, size=(5,1)))
print df
   0
0  1
1  3
2  0
3  0
4  3
df.loc[(df[0] == 0).cumsum()==1,0] = 1
   0
0  1
1  3
2  1
3  0
4  3

Essentially this is using the mask inline with a cumsum.

Set value of first item in slice in python pandas

4 Answers4