I'm currently working with 3 data frames named doctorate
, high_school
and bachelor
that look a bit like this:
ID age education marital_status occupation annual_income Age_25 Age_30 Age_35 Age_40 Age_45 Age_50
1 2 50 doctorate married professional mid 25 and over 30 and over 35 and over 40 and over 45 and over 50 and over
7 8 40 doctorate married professional high 25 and over 30 and over 35 and over 40 and over under 45 under 50
11 12 45 doctorate married professional mid 25 and over 30 and over 35 and over 40 and over 45 and over under 50
16 17 44 doctorate divorced transport mid 25 and over 30 and over 35 and over 40 and over under 45 under 50
I'm trying to create probabilities based on the annual_income
column using the following for loop:
income_levels = ['low','mid','high']
education_levels = [bachelor,doctorate,high_school]
for inc_level in income_levels:
for ed_level in education_levels:
print(inc_level,len(ed_level[ed_level['annual_income'] == inc_level]) / len(ed_level))
It produces this, which is what I want:
low 0.125
low 0.0
low 0.25
mid 0.625
mid 0.75
mid 0.5
high 0.25
high 0.25
high 0.25
However, I want to be able to append these values to a list depending on the income category, the lists would be low_income
,mid_income
,high_income
. I'm sure there's a way that I can modify my for loop to be able to do this, but I can't bridge the gap to getting there. Could anyone help me?