2

I have a dataframe like this:

df = data.frame(time=c(2010:2015,2010:2015),
                variable=c(rep("a",6),rep("b",6)),
                value=c(rnorm(6),rnorm(6,mean=10)))

or:

   time variable      value 
1  2010        a -0.5472416
...
6  2015        a -0.2433123
7  2010        b  9.8617777
... 
12 2015        b  7.7674609

I need to create a new variable 'c=a-b'. The best solution I've found is to use packages 'dplyr' and 'tidyr':

df <- spread(df,variable,value) %>% 
      mutate(c=b-a) %>% 
      gather(variable,value,a:c) %>%
      filter(variable=="c")

which gives the requested outcome:

  time variable      value
1 2010        c  10.444794
2 2011        c   8.121627
...
6 2015        c  10.589378

Is there a more direct way to obtain the same result, which does not require first to "spread" and then to "gather" the dataframe?

Massimo2013
  • 533
  • 4
  • 17

1 Answers1

3

You could use group_by and summarize:

c <- df %>%
    group_by(time) %>%
    summarize(value = diff(value))

Note that this assumes the as come before the bs in the data frame. If you're not sure, you can add an arrange(variable) before the group_by.

If one variable could have years that aren't in the other (as in your comment), you could get rid of those cases by adding an extra step:

c <- df %>%
    group_by(time) %>%
    filter(n() == 2) %>%
    summarize(value = diff(value))
Community
  • 1
  • 1
David Robinson
  • 77,383
  • 16
  • 167
  • 187
  • That will work only under the assumption that `b` is always after `a` – David Arenburg May 10 '15 at 13:20
  • @DavidArenburg I know. I asked about that [above](http://stackoverflow.com/questions/30151717/combining-two-subgroups-of-data-in-the-same-dataframe/30151878#comment48410974_30151717) and added an optional solution. – David Robinson May 10 '15 at 13:22