I am new to this but trying hard to teach myself. I'm taking a look at the babynames
dataset and trying to get a DF
where the name Kerry
is grouped by year with a column for number of female and a column for number of male. Here's what I'm doing:
kDF <- babynames %>%
filter(name == "Kerry") %>%
group_by(year) %>%
spread(sex, n)
And my result:
year name prop F M
(dbl) (chr) (dbl) (int) (int)
1 1920 Kerry 4.019228e-06 5 NA
2 1921 Kerry 5.272723e-06 NA 6
3 1922 Kerry 4.443149e-06 NA 5
4 1923 Kerry 6.181856e-06 NA 7
5 1924 Kerry 1.112053e-05 NA 13
6 1925 Kerry 4.750590e-06 6 NA
7 1925 Kerry 1.215902e-05 NA 14
8 1926 Kerry 8.730209e-06 NA 10
9 1927 Kerry 4.044368e-06 5 NA
10 1927 Kerry 1.205207e-05 NA 14
You can see, there are some duplicate years: 1925, 1927. What I want is a single row for these years with their appropriate F and M values. How do I go about this?
Desired output:
year name prop F M
(dbl) (chr) (dbl) (int) (int)
1 1920 Kerry 4.019228e-06 5 NA
2 1921 Kerry 5.272723e-06 NA 6
3 1922 Kerry 4.443149e-06 NA 5
4 1923 Kerry 6.181856e-06 NA 7
5 1924 Kerry 1.112053e-05 NA 13
6 1925 Kerry 4.750590e-06 6 14 <<<
7 1926 Kerry 8.730209e-06 NA 10
8 1927 Kerry 4.044368e-06 5 14 <<<