0

I have looked around (e.g. here), but I can't understand why my code is not working as expected. I have a pandas dataframe and I'd like to add a column that marks the last zero element in column B above a non-zero element.

df = pd.DataFrame({'B':[0,0,1,0,1,0,0,1]})
N = len(df.index)
df['C'] = N*[False]
for i in range(N-1):
    if (df.iloc[i]['B']==0 and df.iloc[i+1]['B']>0):
        df.iloc[i]['C']=True

In spite of having the condition satisfied 3 times, column C is still all false, and I also get a warning that I don't understand:

SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame

Any ideas?

Nonancourt
  • 559
  • 2
  • 10
  • 21
  • 1
    you can read about the SettingWithCopyWarning [here](https://stackoverflow.com/questions/20625582/how-to-deal-with-settingwithcopywarning-in-pandas) and I think to solve it in your case, it would be `df.loc[i,'C']=True` at the last line. But your problem has a way more efficient answer to it, sure someone will answer for that :) – Ben.T Jun 23 '20 at 17:48
  • 1
    df['C']=np.where(df.B.eq(0) & df.B.shift().gt(0), True,False) – BENY Jun 23 '20 at 17:52

3 Answers3

1

For dataframes with mixed types (like here), it seems pandas creates copies when using iloc and similar functions. Instead of chain indexing, you can do this:

df.iloc[i, df.columns.get_loc('C')]=True

or

df.at[i, 'C'] = True

However, I'd suggest replacing your for loop with this, which looks much more simple to me:

df['C'] = [df.iloc[i]['B'] == 0 and df.iloc[i+1]['B'] > 0 for i in range(N - 1)] + [False]

Edit: If you actually want to find the last occurrence of a non-zero element before an element that's zero, try this:

df['C'].where(df['C']).last_valid_index()

This outputs 6

user
  • 7,435
  • 3
  • 14
  • 44
0

sort by index descending and then loop to find the first row.

df=df.sort_index(ascending=False)
df['C'] = False
for i in range(len(df['B'])):
    if df.iloc[i-1,0] - 1 == df.iloc[i,0]:
        df.iloc[i,1] = True
        break
df=df.sort_index(ascending=True)
df

    B   C
0   0   False
1   0   False
2   1   False
3   0   False
4   1   False
5   0   False
6   0   True
7   1   False
David Erickson
  • 16,433
  • 2
  • 19
  • 35
  • The OP said that the condition it met 3 times, so one should get 3 `True` in C. your method return only one `True` so I think there is a problem somewhere – Ben.T Jun 23 '20 at 18:02
  • @Ben.T you might be right. OP he also did say: "the `last` zero element in column B above a non-zero element." I think better wording might be: "Any zero elements before a one element." – David Erickson Jun 23 '20 at 18:05
0

You can change df.iloc[i]['C']=True from inside your for loop to df.loc[i, 'C'] = True to make it work.

But I would rather use the following to make it a bit more efficient:

df = pd.DataFrame({'B':[0,0,1,0,1,0,0,1]})

df['Check'] = df['B'].shift(-1)
df['C'] = df['B'] < df['Check']

Out:
   B  Check      C
0  0    0.0  False
1  0    1.0   True
2  1    0.0  False
3  0    1.0   True
4  1    0.0  False
5  0    0.0  False
6  0    1.0   True
7  1    NaN  False
frank
  • 389
  • 1
  • 13