0

My dataframe looks like this:

    mid price   dse_high_born
0   0.002039    False
1   0.002039    False
2   0.002039    False
3   0.002039    False
4   0.002039    False
5   0.002038    False
6   0.002039    True
7   0.002037    False
8   0.002037    False
9   0.002037    False
10  0.002036    False
11  0.002036    False
12  0.002038    False
13  0.002038    False
14  0.002038    False
15  0.002038    False
16  0.002039    False
17  0.002039    False
18  0.002040    False
19  0.002040    False
20  0.002040    False
21  0.002039    False
22  0.002039    False
23  0.002039    False
24  0.002040    True
25  0.002040    False
26  0.002041    False
27  0.002041    False
28  0.002041    False
29  0.002042    False
30  0.002044    False
31  0.002049    True
32  0.002049    False
33  0.002048    False

... ...

I tried to use a for loop to add a new column price based on a condition as followed:

for index, row in df.iterrows():
    if df['dse_high_born'] == True:
        df.at[index,'price'] = row['mid price']
    else:
        df.at[index,'price'] = 'nan'

I received the following error: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I tried every combination (with bool(), any(), item(), etc) but when I do the following request df[df['price'] != 'nan'] there is nothing in my dataframe with this condition, any idea why? thanks!

Viktor.w
  • 1,787
  • 2
  • 20
  • 46

2 Answers2

3

This can be done in a much more simple and efficient way using np.where:

import numpy as np
df['price'] = np.where(df.dse_high_born, df.mid_price, np.nan)

    mid_price  dse_high_born  price
0       0.002          False    NaN
1       0.002          False    NaN
2       0.002          False    NaN
3       0.002          False    NaN
4       0.002          False    NaN
5       0.002          False    NaN
6       0.002           True  0.002
7       0.002          False    NaN
...

The problem with your code, is that in the if statement, when checking the condition df['dse_high_born'] == True:, you are not indexing on a particular row, but on the entire colum. You need to index both on row and column using .loc, df.loc[index,'dse_high_born']. So you want something like:

for index, row in df.iterrows():
    if df.loc[index,'dse_high_born'] == True:
        df.loc[index,'price'] = df.loc[index,'mid_price']
    else:
        df.loc[index,'price'] = np.nan
yatu
  • 86,083
  • 12
  • 84
  • 139
1

The error is refering to df['dse_high_born'] == True. I think it should be replaced by row like this?

for index, row in df.iterrows():
if row['dse_high_born'] == True:
    df.at[index,'price'] = row['mid price']
else:
    df.at[index,'price'] = 'nan'
erncyp
  • 1,649
  • 21
  • 23