0

I have the following dataset:

library(babynames)
hadley <- dplyr::filter(babynames, name == "Hadley")


    year   sex   name     n         prop
   <dbl> <chr>  <chr> <int>        <dbl>
1   1906     M Hadley     6 4.164584e-05
2   1908     M Hadley    16 9.616887e-05
3   1909     M Hadley    14 7.915552e-05
4   1910     M Hadley     5 2.397783e-05
5   1911     M Hadley     9 3.728375e-05
6   1912     M Hadley    11 2.436566e-05
7   1913     M Hadley    10 1.864830e-05
8   1914     M Hadley    15 2.195171e-05
9   1915     M Hadley    14 1.589197e-05
10  1916     M Hadley    14 1.516359e-05
# ... with 147 more rows

On the graph we can see, that we should merge some observations:

ggplot(hadley, aes(year, n)) + geom_line()

I have tried aggregate function, but obviously it doesn't work because of categorical variables.

d <- aggregate(x = hadley,by = list(hadley$year),'sum')

How can I correct the code?

Daniel Yefimov
  • 860
  • 1
  • 10
  • 24
  • What do you want to sum? `n` and `prop` for that particular year? – Ronak Shah Aug 09 '16 at 10:02
  • 1
    Do you mean that the graph looks odd in later years because there is a row for male (low N) and female (high N) for each year? E.g., `ggplot(hadley %>% filter(year > 1990), aes(year, n)) + geom_line()`.If you want to aggregate both sexes into a single year, you are almost there. You need `x = hadley$n`. See http://stackoverflow.com/questions/1660124/how-to-sum-a-variable-by-group. – Sam Firke Aug 09 '16 at 10:08
  • @RonakShah I want to add 'n' – Daniel Yefimov Aug 09 '16 at 10:08
  • @SamFirke Yes, I meant exactly what you said! Thank you for the response:) – Daniel Yefimov Aug 09 '16 at 10:10
  • Another equivalent is `aggregate(n~year, hadley,sum)` – Ronak Shah Aug 09 '16 at 10:12

1 Answers1

0

Your problem is that there are separate entries for 'Hadley' for males and females. You could either plot them separately:

ggplot(hadley, aes(year, n, group = sex, colour = sex)) + geom_line()

Or you could merge them as you asked:

library(dplyr)
hadley2 <- hadley %>%
                group_by(year)%>%
                summarize(numbers=sum(n))


ggplot(hadley2, aes(year, numbers)) + geom_line()
mkt
  • 437
  • 7
  • 20