Subtraction based on two factors

Question

My dataframe looks like so:

group <- c("A", "A", "A", "A", "B", "B", "B", "B", "C", "C", "C", "C", "C", "C")
value <- c(3:6, 1:4, 4:9)
type <- c("d", "d", "e", "e", "g", "g", "e", "e", "d", "d", "e", "e", "f", "f")
df <- cbind.data.frame(group, value, type)

df
   group value type
1      A     3    d
2      A     4    d
3      A     5    e
4      A     6    e
5      B     1    g
6      B     2    g
7      B     3    e
8      B     4    e
9      C     4    d
10     C     5    d
11     C     6    e
12     C     7    e
13     C     8    f
14     C     9    f

Within each level of factor "group" I would like to subtract the values based on "type", such that (for group "A") 3 - 5 (1st value of d - 1st value of e) and 4 - 6 (2nd value of d - 2nd value of d). My outcome should look similarly to this..

A
  group d_e
1     A  -2
2     A  -2

B
  group g_e
1     B  -2
2     B  -2

C
  group d_e d_f e_f
1     C  -2  -4  -2
2     C  -2  -4  -2

So if - as for group C - there are more than 2 types, I would like to calculate the difference between each combination of types.

Reading this post I reckon I could maybe use ddply and transform. However, I am struggling with finding a way to automatically assign the types, given that each group consists of different types and also different numbers of types.

Do you have any suggestions as to how I could manage that?

Please note that the printed 'df' differs from the one you create above. E.g. no 'g' in the 'type' vector. — Henrik, Feb 19 '14 at 16:30

G. Grothendieck · Accepted Answer · 2014-09-02T10:21:07.050

4

Its not clear why the sample answer in the post has two identical rows in each output group and not just one but at any rate this produces similar output to that shown:

DF <- df[!duplicated(df[-2]), ]
f <- function(x) setNames(
            data.frame(group = x$group[1:2], as.list(- combn(x$value, 2, diff))),
                   c("group", combn(x$type, 2, paste, collapse = "_"))
            )

by(DF, DF$group, f)

giving:

DF$group: A
  group d_e
1     A  -2
2     A  -2
------------------------------------------------------------ 
DF$group: B
  group d_e
1     B  -2
2     B  -2
------------------------------------------------------------ 
DF$group: C
  group d_e d_f e_f
1     C  -2  -4  -2
2     C  -2  -4  -2

REVISED minor improvements.

edited Sep 02 '14 at 10:21

answered Feb 19 '14 at 16:34

G. Grothendieck

254,981
17
203
341

Thanks a lot for your help. The two output rows are actually not identical, I just chose the example values poorly. But the function also works well without removing "duplicates". It's just that my actual dataset is much bigger (about 20 groups and at least 500 cases per group), thus the function takes quite long and the output is very unclear. Do you have any idea how the function could be modified in that regard? – erc Feb 20 '14 at 08:17

Subtraction based on two factors

1 Answers1

Linked