1

I have a dataframe consisting of 2 variables. Both can take only the values 1 or 0 so that there are only 4 possible combinations (groups). I want to seperate the groups from each other. My idea was to generate with expand.grid all possible combinations and compare each combination with the dataframe. Since this must be done a couple of times I want to use lapply. For this reason I created one list with the dataframe as its only element and a second list with one element for each of the 4 possible combinations.

set.seed(1)
cbind(sample(1:2, 10, replace = TRUE),sample(1:2, 10, replace = TRUE))->pred
data.frame(pred)->pred
list(pred)->pred

expand.grid(1:2,1:2)->groups   
lapply(as.list(data.frame(t(groups))),t)->groups    

The data:

pred

   X1 X2
1   1  1
2   1  1
3   2  2
4   2  1
5   1  2
6   2  1
7   2  2
8   2  2
9   2  1
10  1  2

groups

$X1
      [,1] [,2]
[1,]    1    1

$X2
      [,1] [,2]
[1,]    2    1

$X3
     [,1] [,2]
[1,]    1    2

$X4
     [,1] [,2]
[1,]    2    2

Here the thing that puzzles me:

pred[[1]]==groups[[1]]
       X1    X2
 [1,]  TRUE  TRUE
 [2,]  TRUE  TRUE
 [3,] FALSE FALSE
 [4,] FALSE  TRUE
 [5,]  TRUE FALSE
 [6,] FALSE  TRUE
 [7,] FALSE FALSE
 [8,] FALSE FALSE
 [9,] FALSE  TRUE
[10,]  TRUE FALSE

 pred[[1]]==groups[[2]]
         X1    X2
 [1,] FALSE FALSE
 [2,]  TRUE  TRUE
 [3,]  TRUE  TRUE
 [4,] FALSE  TRUE
 [5,] FALSE  TRUE
 [6,] FALSE  TRUE
 [7,]  TRUE  TRUE
 [8,] FALSE FALSE
 [9,]  TRUE FALSE
 [10,] TRUE FALSE

In the first case it worked and in the second case it did not. What is wrong with the code and is there possibly a better solution for my problem?

Alex
  • 4,925
  • 2
  • 32
  • 48
  • What is your desired output? – David Arenburg Aug 03 '15 at 20:36
  • The output below in the answer is what I wanted. However, I am still wondering why 'pred[[1]]==groups[[2]]' delivers such a strange result. For – Alex Aug 03 '15 at 21:04
  • I somehow don't get it. `pred[[1]]==groups[[1]]` compares the first row of `pred[[1]]` (1,1) with `groups[[1]]` (1,1). The result is (TRUE, TRUE). That holds for all rows . However, `pred[[1]]==groups[[2]]` is doing something else. Comparing the first row of `pred[[1]]` (1,1) with `groups[[2]]` (2,1) delivers (TRUE, TRUE) instead of (FALSE, TRUE). – Alex Aug 03 '15 at 21:19
  • Yes, it seems like `pred[[1]]` is being unlisted and each two values being compared against `2:1` which is being recycled all the time – David Arenburg Aug 03 '15 at 21:26

2 Answers2

4

You don't need to convert it to a list; you can work directly from the data.frame as follows:

This seems like a perfect place to use .GRP from data.table:

library(data.table)
setDT(pred)[,grp:=.GRP,by=.(X1,X2)][]
    X1 X2 grp
 1:  1  1   1
 2:  1  1   1
 3:  2  2   2
 4:  2  1   3
 5:  1  2   4
 6:  2  1   3
 7:  2  2   2
 8:  2  2   2
 9:  2  1   3
10:  1  2   4
MichaelChirico
  • 33,841
  • 14
  • 113
  • 198
  • I think it's from the `list(pred)->pred` code. Not sure what that's about. bit of discrepancy in that OP said he has a `data.frame` but there declares it as a `list`. – MichaelChirico Aug 03 '15 at 20:32
  • @akrun could help to say what error you're getting – Señor O Aug 03 '15 at 20:36
  • You'll want to `setDT(pred[[1]])` since the op put the data frame *inside* a list! @SeñorO I also see an error: "Error in FUN(X[[i]], ...) : Invalid column: it has dimensions." – Frank Aug 03 '15 at 20:37
  • Yeah. strange that neither `setDT(pred)` nor `data.table(pred)` work, considering `data.frame(pred)` does what you'd think. – MichaelChirico Aug 03 '15 at 20:38
  • Another alternative is `rbindlist(pred)` which automatically sets `data.table` as the class of the result. – MichaelChirico Aug 03 '15 at 20:45
2

Here's a non-data.table solution.

d$group <- factor(paste0(d$X1, d$X2), labels=1:4)
d
   X1 X2 group
1   1  2     2
2   2  2     4
3   1  1     1
4   1  2     2
5   1  2     2
6   1  2     2
7   2  1     3
8   2  2     4
9   1  1     1
10  2  2     4
Josh
  • 1,248
  • 12
  • 25