I have a very large data set with multiple columns, but will only select 2 columns: Parental Education Level and Gender.
parent_edu gender n
<chr> <chr> <int>
1 associate's degree female 116
2 associate's degree male 106
3 bachelor's degree female 63
4 bachelor's degree male 55
5 high school female 94
6 high school male 102
7 master's degree female 36
8 master's degree male 23
9 some college female 118
10 some college male 108
11 some high school female 91
12 some high school male 88
From here, I need to use the count
function to generate a new column n that counts how many females have parents with that level of education and how many males have parents with that level of education.
student1 %>%
count(parent_edu, gender) %>%
The final step is trying to get a last column that has averages within those different education levels for the different genders. So, for example, we have "some college" and there are 52% females and 48% males, and then maybe "high school" and 47% females and 53% males.
So far, I'm using the mutate
function ineffectively in the following way:
student1 %>%
count(parent_edu, gender) %>%
mutate(percentage =
Can anyone guide me a little on what kind of equation I should put in there? Or use pipe
to add any other functions?
Final Result should look like this:
parent_edu gender n percentage
<chr> <chr> <int> <dbl>
associate's degree female 116 0.52
associate's degree male 106 0.48
bachelor's degree female 63 0.53
bachelor's degree male 55 0.47
high school female 94 0.48
high school male 102 0.52
master's degree female 36 0.61
master's degree male 23 0.39
some college female 118 0.52
some college male 108 0.48
Including dput:
df <- structure(list(parent_edu = c("associate's degree", "associate's degree",
"bachelor's degree", "bachelor's degree", "high school", "high school",
"master's degree", "master's degree", "some college", "some college"
), gender = c("female", "male", "female", "male", "female", "male",
"female", "male", "female", "male"), n = c(116, 106, 63, 55,
94, 102, 36, 23, 118, 108)), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame"))