0

I am new to this but trying hard to teach myself. I'm taking a look at the babynames dataset and trying to get a DF where the name Kerry is grouped by year with a column for number of female and a column for number of male. Here's what I'm doing:

kDF <- babynames %>%
  filter(name == "Kerry") %>%
  group_by(year) %>%
  spread(sex, n)

And my result:

    year  name         prop     F     M
   (dbl) (chr)        (dbl) (int) (int)
1   1920 Kerry 4.019228e-06     5    NA
2   1921 Kerry 5.272723e-06    NA     6
3   1922 Kerry 4.443149e-06    NA     5
4   1923 Kerry 6.181856e-06    NA     7
5   1924 Kerry 1.112053e-05    NA    13
6   1925 Kerry 4.750590e-06     6    NA
7   1925 Kerry 1.215902e-05    NA    14
8   1926 Kerry 8.730209e-06    NA    10
9   1927 Kerry 4.044368e-06     5    NA
10  1927 Kerry 1.205207e-05    NA    14

You can see, there are some duplicate years: 1925, 1927. What I want is a single row for these years with their appropriate F and M values. How do I go about this?

Desired output:

    year  name         prop     F     M
   (dbl) (chr)        (dbl) (int) (int)
1   1920 Kerry 4.019228e-06     5    NA
2   1921 Kerry 5.272723e-06    NA     6
3   1922 Kerry 4.443149e-06    NA     5
4   1923 Kerry 6.181856e-06    NA     7
5   1924 Kerry 1.112053e-05    NA    13
6   1925 Kerry 4.750590e-06     6    14 <<<
7   1926 Kerry 8.730209e-06    NA    10
8   1927 Kerry 4.044368e-06     5    14 <<<
Cœur
  • 37,241
  • 25
  • 195
  • 267
rynwlms
  • 3
  • 4
  • 1
    Possible duplicate of [tidyr spread function generates sparse matrix when compact vector expected](http://stackoverflow.com/questions/27501577/tidyr-spread-function-generates-sparse-matrix-when-compact-vector-expected) – jeremycg Dec 03 '15 at 02:36
  • 2
    What is prop? What value do you expect it to take in the rows of the result dataframe? – Elin Dec 03 '15 at 02:57
  • Thanks for your help, Elin. I'm not sure I know what you're asking. Unsure of 'prop' or 'it' in this context. But, compared to above, what I'd like to see is: year name prop F M 6 1925 Kerry 4.750590e-06 6 14 7 1926 Kerry 8.730209e-06 NA 10 8 1927 Kerry 4.044368e-06 5 14 – rynwlms Dec 03 '15 at 03:26
  • So you don't want the `1.205207e-05 ` in 1927 for example. – Elin Dec 03 '15 at 03:44

1 Answers1

0

I think you want something like this:

library(dplyr)
library(tidyr)
answer = 
  babynames %>%
  filter(name == "Kerry") %>%
  group_by(year, sex) %>%
  summarize(n = sum(n)) %>%
  spread(sex, n, fill = 0)
bramtayl
  • 4,004
  • 2
  • 11
  • 18