0

Let's say I have data frame of integer values 0 t0 100. and I want to classify these values into 3 parts, low, mid and high with low being less than 33, high being greater than 66 and mid is in between 33 and 66. So I use

df['low'] = df['int'] <= 33
df['mid'] = 33  < df['int'] < 66
df['high'] = df['int'] >= 66 

and i get error

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_12080/1299746928.py in <module>
      1 df['low'] = df['int'] <= 33
----> 2 df['mid'] = 33  < df['int'] < 66
      3 df['high'] = df['int'] >= 66

c:\program files\python37\lib\site-packages\pandas\core\generic.py in __nonzero__(self)
   1536     def __nonzero__(self):
   1537         raise ValueError(
-> 1538             f"The truth value of a {type(self).__name__} is ambiguous. "
   1539             "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
   1540         )

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

I have alredy tried if else statement and also and and other operator. low and high works, but the mid doesn't work.

please can I know any way around?

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153

1 Answers1

0

As you get in comments you can use between(33, 66)

df['mid'] = df['int'].between(33, 66)

But you can write it also as

df['mid'] = (33 < df['int']) & (df['int'] < 66)

You could also use (not low) and (not high)

df['mid'] = ~df['low'] & ~df['high']

Minimal working example

import pandas as pd

data = {
    'int': range(0, 100, 10), 
}

df = pd.DataFrame(data)
 
df['low'] = df['int'] <= 33
df['high'] = df['int'] >= 66

df['mid1'] = (33 < df['int']) & (df['int'] < 66)

df['mid2'] = ~df['low'] & ~df['high']

df['mid-between'] = df['int'].between(33, 66)

print(df)

Result:

   int    low   high   mid1   mid2  mid-between
0    0   True  False  False  False        False
1   10   True  False  False  False        False
2   20   True  False  False  False        False
3   30   True  False  False  False        False
4   40  False  False   True   True         True
5   50  False  False   True   True         True
6   60  False  False   True   True         True
7   70  False   True  False  False        False
8   80  False   True  False  False        False
9   90  False   True  False  False        False

BTW:

If you use pandas.cut()

bins =  pd.cut(df['int'], [-1, 33, 66, 100], labels=['low', 'mid', 'high']) 

print( bins )

then you can get

0     low
1     low
2     low
3     low
4     mid
5     mid
6     mid
7    high
8    high
9    high
import pandas as pd

data = {
    'int': range(0, 100, 10), 
}

df = pd.DataFrame(data)

min_value = min(df['int']) - 1
max_value = max(df['int']) + 1

bins = pd.cut(df['int'], [min_value, 33, 66, max_value], labels=['low', 'mid', 'high'])

print(bins)

but it rather works as open-close range (in math (a, b]):

min < x <= 33
 33 < x <= 66
 66 < x <= max
furas
  • 134,197
  • 12
  • 106
  • 148