Using gather on an already gathered-like data.frame in R

Question

I have a data.frame in R that contains ages, lengths, and the total count of individuals within each length group. I want to get mean and standard deviation of lengths for each age group, and I feel doing this with dplyr will be easiest. However, I can't seem to figure out how to gather() this particular dataset. Here is the data:

dat <- data.frame(age = sort(rep(1:5, 5)),
              length = c(6:10, 8:12, 10:14, 12:16, 14:18),
              total = sample(25:50, 50, replace=T))

which looks like this:

  age length total
   1      6    38
   1      7    42
   1      8    49
   1      9    28
   1     10    26
   2      8    37

And, I want it to look like the following so that I can easily group_by(age) %>% summarize(mean = mean(length), sd = sd(length)).

age  length
1     6
1     6
1     6
1     6
1     6

etc. (i.e. there should be 38 6s for age 1, 42 7s for age 1 and so on).

How can I achieve this using the gather() function from tidyr? I can't seem to be able to do it. Happy to hear alternative suggestions.

score 1 · Answer 1 · answered Oct 05 '16 at 11:50

1

How about calculating the weighted mean instead?

dat <- data.frame(age = sort(rep(1:5, 5)),
                  length = c(6:10, 8:12, 10:14, 12:16, 14:18),
                  total = sample(25:50, 50, replace=T))
library(magrittr)
library(dplyr)

dat %>% 
  group_by(age) %>%
  summarise(mean_length = sum(length * total) / sum(total),
            wtd_mean = weighted.mean(length, total))

EDIT: it occurred to me after posting earlier that R has a weighted.mean function that makes this even simpler.

answered Oct 05 '16 at 11:50

Benjamin

16,897
6
45
65

I had done something like this, but it gets a bit hairier when calculating standard deviation. :| – code_cowboy Oct 05 '16 at 13:19
See `?Hmisc::wtd.var`. `Hmisc` also has a `wtd.quantile` if you're into non-parametric measures. – Benjamin Oct 05 '16 at 13:21

Using gather on an already gathered-like data.frame in R

1 Answers1