I would like to apply diff functions to each row in the dataframe based on which category it is in.
def weigh_by_education(row):
if education_income.education_level == 'College':
return row / 1013
if education_income.education_level == 'Doctorate':
return row / 451
if education_income.education_level == 'Graduate':
return row / 3128
if education_income.education_level == 'High School':
return row / 2013
if education_income.education_level == 'Post-Graduate':
return row / 516
if education_income.education_level == 'Uneducated':
return row / 1487
else:
return row / 1519
Heres my function. - I applied it to my dataframe to try to create a new column called percent -> the number of users in each education_level weighted by the total number of people in that respective category.
education_income['percent'] = education_income['user_id'].apply(lambda row: weigh_by_education(row))
However each time it throws a ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
My original dataframe is grouped by columns: education_level, income_category. In the values column is the user counts. I want to weigh the user counts by the total number of people in each education_level category. What can I do?