Combining two subgroups of data in the same dataframe

Question

I have a dataframe like this:

df = data.frame(time=c(2010:2015,2010:2015),
                variable=c(rep("a",6),rep("b",6)),
                value=c(rnorm(6),rnorm(6,mean=10)))

or:

   time variable      value 
1  2010        a -0.5472416
...
6  2015        a -0.2433123
7  2010        b  9.8617777
... 
12 2015        b  7.7674609

I need to create a new variable 'c=a-b'. The best solution I've found is to use packages 'dplyr' and 'tidyr':

df <- spread(df,variable,value) %>% 
      mutate(c=b-a) %>% 
      gather(variable,value,a:c) %>%
      filter(variable=="c")

which gives the requested outcome:

  time variable      value
1 2010        c  10.444794
2 2011        c   8.121627
...
6 2015        c  10.589378

Is there a more direct way to obtain the same result, which does not require first to "spread" and then to "gather" the dataframe?

Are the values within `a` and `b` always in the same order (in terms of subtracting the first `a` from the first `b`, second `a` from the second `b`, and so on)? — David Robinson, May 10 '15 at 13:04
usually they are (and they are in the example), but I cannot rule out the case that some years are missing — Massimo2013, May 10 '15 at 13:07

score 3 · Answer 1 · edited May 23 '17 at 12:22

3

You could use group_by and summarize:

c <- df %>%
    group_by(time) %>%
    summarize(value = diff(value))

Note that this assumes the as come before the bs in the data frame. If you're not sure, you can add an arrange(variable) before the group_by.

If one variable could have years that aren't in the other (as in your comment), you could get rid of those cases by adding an extra step:

c <- df %>%
    group_by(time) %>%
    filter(n() == 2) %>%
    summarize(value = diff(value))

edited May 23 '17 at 12:22

Community

answered May 10 '15 at 13:17

David Robinson

That will work only under the assumption that `b` is always after `a` – David Arenburg May 10 '15 at 13:20
@DavidArenburg I know. I asked about that [above](http://stackoverflow.com/questions/30151717/combining-two-subgroups-of-data-in-the-same-dataframe/30151878#comment48410974_30151717) and added an optional solution. – David Robinson May 10 '15 at 13:22

1 Answers1