1

I'm currently working with 3 data frames named doctorate, high_school and bachelor that look a bit like this:

    ID  age education   marital_status  occupation  annual_income   Age_25  Age_30  Age_35  Age_40  Age_45  Age_50
1   2   50  doctorate   married professional    mid 25 and over 30 and over 35 and over 40 and over 45 and over 50 and over
7   8   40  doctorate   married professional    high    25 and over 30 and over 35 and over 40 and over under 45    under 50
11  12  45  doctorate   married professional    mid 25 and over 30 and over 35 and over 40 and over 45 and over under 50
16  17  44  doctorate   divorced    transport   mid 25 and over 30 and over 35 and over 40 and over under 45    under 50

I'm trying to create probabilities based on the annual_income column using the following for loop:

income_levels = ['low','mid','high']
education_levels = [bachelor,doctorate,high_school]

for inc_level in income_levels:
    for ed_level in education_levels:
        print(inc_level,len(ed_level[ed_level['annual_income'] == inc_level]) / len(ed_level))

It produces this, which is what I want:

low 0.125
low 0.0
low 0.25
mid 0.625
mid 0.75
mid 0.5
high 0.25
high 0.25
high 0.25

However, I want to be able to append these values to a list depending on the income category, the lists would be low_income,mid_income,high_income. I'm sure there's a way that I can modify my for loop to be able to do this, but I can't bridge the gap to getting there. Could anyone help me?

Jaimee-lee Lincoln
  • 365
  • 1
  • 3
  • 11
  • 1
    Seems like it might make sense to combine the 3 dataframes and then use a groupby as in [this answer](https://stackoverflow.com/a/23377232/7835267), that way your labels and order would be preserved and you could use tolist or todict – G. Anderson May 08 '20 at 02:21

1 Answers1

0

In this case, you're trying to find list via a key/string. Why not just use a dict of lists?

income_levels = ['low','mid','high']
education_levels = [bachelor,doctorate,high_school]

# initial dictionary
inc_level_rates = {il: list() for il in income_levels}

for inc_level in income_levels:
    for ed_level in education_levels:
        rate = len(ed_level[ed_level['annual_income'] == inc_level]) / len(ed_level)
        inc_level_rates[inc_level].append(rate)
        print(inc_level, rate)

print(inc_level_rates)
Craig Kelly
  • 3,776
  • 2
  • 18
  • 17