2

I have tried using code from many answers for similar questions to this one, but I haven't found anything that's working for me when I am trying to set multiple conditions that decides the value of a column - I also want to do this in 3 different ways.

The data I have looks like this:

col1 col2 col3 col4 col5
 1     1    1    4    1
 0     1    1    1    1
 0     0    1    1    1

I want to add another column dependent on if columns 1-5 have a value of >=1 to look like this:

col1 col2 col3 col4 col5 category
 1     1    1    4    1   certain
 0     1    1    1    1   probable
 0     0    1    1    1   possible

I have tried code such as this:

df = pd.read_csv('file.csv',header=0)
m1 = df.col1 >= 1 & df.col2 >= 1 & df.col3 >= 1 & df.col4 >= 1 & df.col5 >= 1
m2 = df.col2 >= 1 & df.col3 >= 1 & df.col4 >= 1 & df.col5 >= 1
m3 = df.col3 >= 1 & df.col4 >= 1 & df.col5 >= 1

df['category'] = np.select([m1, m2, m3], ['certain', 'possible', 'probable'], default='Other')

But this gives an error at the first line:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

From trying to understand this error, do I need to set that a value >=1 is True and anything else is False before runnning this code?

DN1
  • 234
  • 1
  • 13
  • 38
  • Heres a good explanation on conditional creation of new columns: https://stackoverflow.com/questions/19913659/pandas-conditional-creation-of-a-series-dataframe-column – Erfan Mar 05 '19 at 10:40

3 Answers3

3

You're missing parenthesis when defining the conditions. The reason behind this is that bitwise operators have higher precedence than comparissons. Instead use:

m1 = (df.col1 >= 1) & (df.col2 >= 1) & (df.col3 >= 1) & 
     (df.col4 >= 1) & (df.col5 >= 1)
m2 = (df.col2 >= 1) & (df.col3 >= 1) & (df.col4 >= 1) & (df.col5 >= 1)
m3 = (df.col3 >= 1) & (df.col4 >= 1) & (df.col5 >= 1)

df['category'] = np.select([m1, m2, m3], ['certain', 'possible', 'probable'], 
                           default='Other')

Which results in the expected output:

    col1  col2  col3  col4  col5  category
0     1     1     1     4     1   certain
1     0     1     1     1     1  possible
2     0     0     1     1     1  probable
yatu
  • 86,083
  • 12
  • 84
  • 139
1

this works

df['cateogry'] = df.apply(lambda x: 'Certain' if sum(x.values >= 1) >= 5  else 'Probable' if sum(x.values >= 1) >= 4 else 'Possible' , axis=1)

Output

   col1  col2  col3  col4  col5  cateogry
0     1     1     1     4     1   Certain
1     0     1     1     1     1  Probable
2     0     0     1     1     1  Possible
iamklaus
  • 3,720
  • 2
  • 12
  • 21
0

create a function and apply it to the dataframe.

def create_new_column(row):
    if row['column1'] > 1 and row['column2'] > 1:
        return 1
    else:
        return 0

df['new_column'] = df.apply(lambda x: create_new_column(x), axis=1)
Kevin
  • 204
  • 1
  • 12