1

I have the following dataset:

enter image description here

I am trying to tell pandas that:

If Report No. is below 30, he needs to create a new variable that is equal to

df_bei_index[col]*0.05 + df_bei_index['PDI_Average']*0.95.

If Report No. is higher or equal to 30, he needs to create a new variable that is equal to

df_bei_index[col]

I wrote the following code:

for col in col_list:
    if df_bei_index['Report No'] <= 29:
        df_bei_index[col+'_final'] = df_bei_index[col]*0.05 + df_bei_index['PDI_Average']*0.95
    else:
        df_bei_index[col+'_final'] = df_bei_index[col]

But I get this error


ValueError Traceback (most recent call last) in () 10 11 for col in col_list: ---> 12 if df_bei_index['Report No'] <= 29: 13 df_bei_index[col+'_final'] = df_bei_index[col]*0.05 + df_bei_index['PDI_Average']*0.95 14 else:

~\Anaconda3\lib\site-packages\pandas\core\generic.py in nonzero(self) 1574 raise ValueError("The truth value of a {0} is ambiguous. " 1575 "Use a.empty, a.bool(), a.item(), a.any() or a.all()." -> 1576 .format(self.class.name)) 1577 1578 bool = nonzero

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Filippo Sebastio
  • 1,112
  • 1
  • 12
  • 23

2 Answers2

0

Check this answer: Python Use if function: ValueError:Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()

You may want to use np.where:

for col in col_list:
        df_bei_index[col+'_final'] = np.where(df_bei_index['Report No'] <=29, df_bei_index[col]*0.05 + df_bei_index['PDI_Average']*0.95, df_bei_index[col])

I am assuming you are excluding your 'Country' column from the 'col_list' list

Jorge
  • 2,181
  • 1
  • 19
  • 30
0

An expression like df_bei_index['Report No'] <= 29 has type Series(bool), so you cannot use it in an if statement, but you can use it as an index in .loc for your dataframe:

import pandas as pd

data = {'a': list(range(20)), 'b': list(range(6,26))}
df = pd.DataFrame(data = data)

condition1 = df.a <= 10
condition2 = df.a > 10
df.loc[condition1, 'a_1'] = df.loc[condition1]['a'] * 2
df.loc[condition2, 'a_1'] = df.loc[condition2]['a'] * 5