dplyr_generate new column that takes a percentage of boolean rows

Question

I have a very large data set with multiple columns, but will only select 2 columns: Parental Education Level and Gender.

    parent_edu             gender     n
        <chr>              <chr>  <int>
     1 associate's degree female   116
     2 associate's degree male     106
     3 bachelor's degree  female    63
     4 bachelor's degree  male      55
     5 high school        female    94
     6 high school        male     102
     7 master's degree    female    36
     8 master's degree    male      23
     9 some college       female   118
    10 some college       male     108
    11 some high school   female    91
    12 some high school   male      88

From here, I need to use the count function to generate a new column n that counts how many females have parents with that level of education and how many males have parents with that level of education.

    student1 %>%
    count(parent_edu, gender) %>%

The final step is trying to get a last column that has averages within those different education levels for the different genders. So, for example, we have "some college" and there are 52% females and 48% males, and then maybe "high school" and 47% females and 53% males. So far, I'm using the mutate function ineffectively in the following way:

    student1 %>%
    count(parent_edu, gender) %>%
    mutate(percentage =

Can anyone guide me a little on what kind of equation I should put in there? Or use pipe to add any other functions? Final Result should look like this:

    parent_edu         gender      n      percentage
    <chr>              <chr>      <int>    <dbl>
    associate's degree  female    116      0.52
    associate's degree  male      106      0.48
    bachelor's degree   female    63       0.53
    bachelor's degree   male      55       0.47
    high school         female    94       0.48
    high school         male      102      0.52
    master's degree     female    36       0.61
    master's degree     male      23       0.39
    some college        female    118      0.52
    some college        male      108      0.48

Including dput:

df <- structure(list(parent_edu = c("associate's degree", "associate's degree", 
"bachelor's degree", "bachelor's degree", "high school", "high school", 
"master's degree", "master's degree", "some college", "some college"
), gender = c("female", "male", "female", "male", "female", "male", 
"female", "male", "female", "male"), n = c(116, 106, 63, 55, 
94, 102, 36, 23, 118, 108)), row.names = c(NA, -10L), class = c("tbl_df", 
"tbl", "data.frame"))

[See here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) on making an R question that folks can help with. That includes a sample of data, all necessary code, and some sense of what research you've already done — camille, Mar 22 '20 at 15:40

Matt · Answer 1 · 2020-03-22T17:21:09.407

0

Updated version:

dput

df <- structure(list(parent_edu = c("associate's degree", "associate's degree", 
"bachelor's degree", "bachelor's degree", "high school", "high school", 
"master's degree", "master's degree", "some college", "some college"
), gender = c("female", "male", "female", "male", "female", "male", 
"female", "male", "female", "male"), n = c(116, 106, 63, 55, 
94, 102, 36, 23, 118, 108)), row.names = c(NA, -10L), class = c("tbl_df", 
"tbl", "data.frame"))

Solution:

df <- df %>%
  group_by(parent_edu) %>% # grouping by parent education 
  mutate(total = sum(n)) %>% # total within groups
  mutate(percentage = (n/total)) %>% # calculating percentage
  mutate(percentage = round(percentage, 2)) %>% # rounding to match your example
  select(-total) # dropping the total column

edited Mar 22 '20 at 17:21

answered Mar 22 '20 at 16:05

Matt

7,255
2
12
34

Really appreciate it Matt. This gets me closer to understanding it, but I'm only supposed to generate one more column on the final table. Still, thank you! – WOT Mar 22 '20 at 16:09
If you type dput(df) in your console and paste the output, that would really help to be able to troubleshoot this. Thanks! – Matt Mar 22 '20 at 16:24
sorry Matt, this post is my very first one, and it's a mess... – WOT Mar 22 '20 at 16:37
No worries, Mark! I updated my answer above. This seems to get at what you're trying to accomplish. – Matt Mar 22 '20 at 17:21
It returned "Error in sum(n) : invalid 'type' (closure) of argument". I think I'm setting you up for failure by not giving you everything you need, and I'm too new to know exactly what I'm failing to give you. :-/ When I get better, I'll remember to pay it forward! – WOT Mar 22 '20 at 17:52
Everyone is a beginner at some point for everything, don't be so hard on yourself! I think you're just missing a parenthesis somewhere. – Matt Mar 22 '20 at 22:56
1

Just needed to add the ```count(parent_edu, gender)``` back in! – WOT Mar 23 '20 at 00:55

score 0 · Accepted Answer · answered Mar 23 '20 at 00:21

Final answer was this:

    student1 %>%
    count(parent_edu, gender) %>%
    group_by(parent_edu) %>% # grouping by parent education 
    mutate(total = sum(n)) %>% # total within groups
    mutate(percentage = (n/total)) %>% # calculating percentage
    mutate(percentage = round(percentage, 2)) %>% # rounding to match your example
    select(-total) # dropping the total column

dplyr_generate new column that takes a percentage of boolean rows

2 Answers2