Use multiple conditions on a column to assign values of new column

Question

I'm trying to assign one of 8 labels to my data based on the strings in an existing column. However, with the method I'm using I get this error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I have 144 different strings I'm looking for, that I want to assign to 8 labels.

Here is a simplified example of what I mean. If A is the existing column in my dataframe, I want to create B with the strings assigned depending on the value of A.

Dataframe:

   A     B
0  1   low
1  1   low
2  2   mid
3  3   mid
4  5  high
5  4   mid
6  2   mid
7  5  high

The code I'm using currently is something like:

for index, row in df.iterrows():
    if df['A'] == 1:
        df['Label'] = 'low'
    elif any([df['A'] == 2, df['A'] == 3, df['A'] == 4]):
        df['Label'] = 'mid'
    elif df['A'] == 5:
        df['Label'] = 'high'

I think it is the use of any() that is giving me the error. As I understand it, this is because of how pandas works, but I don't really understand it. Is there any easier way to do this?

Any help or pointers would be appreciated :)

seems like you never reach the 'high' condition, is that what you want? — Yuca, Jul 01 '19 at 12:15
`l=[df.A.eq(1),df.A.isin([2,3,4]),df.A.eq(5)]` and then `df['B']=np.select(l,['low','mid','high'])` ill do it faster. Don't use iterrows for such cases — anky, Jul 01 '19 at 12:16
I reach it a couple of times I think. There are two instances of 5 in column A. — ShrutiTurner, Jul 01 '19 at 12:17
but according to your condition it should evaluate to 'mid', no? — Yuca, Jul 01 '19 at 12:18
Ah, apologies - that was a typo. It should have been a 4. Corrected now. — ShrutiTurner, Jul 01 '19 at 12:20

Erfan · Accepted Answer · 2019-07-01T12:32:08.013

There's no need for itterrows here, which is bad practice and considered slow.

Method 1 `pd.cut`

df['B'] = pd.cut(df['A'], [0,1,4,10], labels=['low', 'mid', 'high'])

   A     B
0  1   low
1  1   low
2  2   mid
3  3   mid
4  5  high
5  4   mid
6  2   mid
7  5  high

Method 2 `np.select`

conditions = [
    df['A'] == 1,
    df['A'].isin([2, 3, 4])
]

choices = ['low', 'mid']

df['B'] = np.select(conditions, choices, default='high')

   A     B
0  1   low
1  1   low
2  2   mid
3  3   mid
4  5  high
5  4   mid
6  2   mid
7  5  high

score 1 · Answer 2 · answered Jul 01 '19 at 12:43

Why don't you simply create a function and apply it on column, so easy so pythonic

def mapper(x):
     if x == 1:
        return 'low'
     elif x for i in [2, 3, 4]):
        return 'mid'
     elif x == 5:
        return 'high'
     else:
        return 'wtf'

df['B'] = df['A'].apply(mapper)

Another way could be create a dataframe from dictionary of mapping and do a join, this is even more intuitive

or another way is refer map function for series map function

Ideally, i would prefer going from bottom to top wrt increasing order of complexity

score 0 · Answer 3 · answered Jul 01 '19 at 12:21

use .loc with conditions in the index, as follows:

import pandas as pd
from io import StringIO

df = pd.read_csv(StringIO("""
   A
0  1
1  1
2  2
3  3
4  5
5  4
6  2
7  5
"""), sep=r"\s+")

df.loc[df["A"] == 1, "B"] = "low"
df.loc[df["A"].isin((2, 3, 4)), "B"] = "mid"
df.loc[df["A"] == 5, "B"] = "high"

print(df)

Output:

   A     B
0  1   low
1  1   low
2  2   mid
3  3   mid
4  5  high
5  4   mid
6  2   mid
7  5  high

score 0 · Answer 4 · answered Jul 01 '19 at 12:24

0

The answer from @anky_91 in the comments has solved the problem simply:

l=[df.A.eq(1),df.A.isin([2,3,4]),df.A.eq(5)]
df['B']=np.select(l,['low','mid','high'])

This is much faster and works well.

Thanks for everyone's help! :)

answered Jul 01 '19 at 12:24

ShrutiTurner

174
2
14

Use multiple conditions on a column to assign values of new column

4 Answers4

Method 1 pd.cut

Method 2 np.select

Method 1 `pd.cut`

Method 2 `np.select`