0

Attempting to run a function to separate age of house into certain categories, and then create a new column in the original dataframe using the result. Here is the code for the IF Statement:

def sort_age(data):
    if (data["housing_median_age"] > 40) : 
        return ('Cat 5')
    elif ((30 <= data["housing_median_age"]) & (data["housing_median_age"] <= 40)) : 
        return ('Cat 4')
    elif ((20 <= data["housing_median_age"]) & (data["housing_median_age"] < 30)) :
        return ('Cat 3')
    elif (10 <= data["housing_median_age"] < 20) : 
        return ('Cat 2')
    elif (0 <= data["housing_median_age"] < 10) : 
        return ('Cat 1')
    else:
        return ('None')


# Here's the code for the new column:
p1data['age_category'] = p1data.apply(lambda x: sort_age(p1data), axis = 1)


The error message looks like this:

ValueError: ('The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().', 'occurred at index 0')
Nalhcal
  • 75
  • 8
  • use [numpy.select](https://docs.scipy.org/doc/numpy/reference/generated/numpy.select.html) – It_is_Chris Apr 03 '20 at 13:10
  • `rng = [0,10,20,30,40] category = pd.cut(data["housing_median_age"], rng, labels=True,right=True) ` It would be easier for me to add a new column of category values and aggregate them. – r-beginners Apr 03 '20 at 13:18

1 Answers1

0

To perform such categorization use a function deficated to this case, namely pd.cut. The code can be:

p1data['age_category'] = pd.cut(p1data.housing_median_age,
    bins=[0, 10, 20, 30, 40.001, 200], right=False,
    labels=['Cat 1', 'Cat 2', 'Cat 3', 'Cat 4', 'Cat 5'])

Details:

  • bins bin edges. Note some irregularity: All values are integers, but one of them is 40.001, to provide that value of 40 will be categorized as Cat 4.
  • right=False - to make bins open on the right side.

The advantage ot his solution is that the names assigned are just Pandas categories and even if their names were not alpabetically ordered, their logical order will be kept after any sorting.

Valdi_Bo
  • 30,023
  • 4
  • 23
  • 41