11

this question is kind of related to this one, however i want to create an Index using a unique combination of two data.frame columns. So my data structure looks for example like this (dput):

structure(list(avg = c(0.246985988921473, 0.481522354272779, 
0.575400762275067, 0.14651009243539, 0.489308880181752, 0.523678968337178
), i_ID = c("H", "H", "C", "C", "H", "S"), j_ID = c("P", "P", 
"P", "P", "P", "P")), .Names = c("avg", "i_ID", "j_ID"), row.names = 7:12, class = "data.frame")

The created Index for the above structure should therefore look like this

1
1
2
2
1
3

In the example data the column j_ID always has the value P, but this isn't always the case. Furthermore vice-versa (S-P or P-S) combinations should result in the same index.

Someone knows a nice way to accomplish that? I can do it with a lot of for-loops and if-else commands, but thats not really elegant.

Community
  • 1
  • 1
Curlew
  • 1,022
  • 18
  • 39

2 Answers2

6

The interaction function will work well:

foo = structure(list(avg = c(0.246985988921473, 0.481522354272779, 0.575400762275067, 0.14651009243539, 0.489308880181752, 0.523678968337178), i_ID = c("H", "H", "C", "C", "H", "S"), j_ID = c("P", "P", "P", "P", "P", "P")), .Names = c("avg", "i_ID", "j_ID"), row.names = 7:12, class = "data.frame")

foo$idx <- as.integer(interaction(foo$i_ID, foo$j_ID))

> foo
         avg i_ID j_ID idx
7  0.2469860    H    P   2
8  0.4815224    H    P   2
9  0.5754008    C    P   1
10 0.1465101    C    P   1
11 0.4893089    H    P   2
12 0.5236790    S    P   3

Ah, I didn't read carefully enough. There is probably a more elegant solution, but you can use outer function and upper and lower triangles:

# lets assign some test values
x <- c('a', 'b', 'c') 
foo$idx <- c('a b', 'b a', 'b c', 'c b', 'a a', 'b a') 

mat <- outer(x, x, FUN = 'paste') # gives all possible combinations
uppr_ok <- mat[upper.tri(mat, diag=TRUE)]
mat_ok <- mat
mat_ok[lower.tri(mat)] <- mat[upper.tri(mat)]

Then you can match indexes found in mat with those found in mat_ok:

foo$idx <- mat_ok[match(foo$idx, mat)]
Nimantha
  • 6,405
  • 6
  • 28
  • 69
Justin
  • 42,475
  • 9
  • 93
  • 111
1

To add to Justin's answer, if you would like the indexes to preserve the order of the original of the i_ID, you can assign the interaction() results to a variable and then order the levels.

x <- interaction(foo$i_ID, foo$j_ID) 
x <- factor(x, levels=levels(x)[order(unique(foo$i_ID))])

foo$idx <- as.integer(x)

which gives:

> foo
         avg i_ID j_ID idx
7  0.2469860    H    P   1
8  0.4815224    H    P   1
9  0.5754008    C    P   2
10 0.1465101    C    P   2
11 0.4893089    H    P   1
12 0.5236790    S    P   3
Nimantha
  • 6,405
  • 6
  • 28
  • 69
Ricardo Saporta
  • 54,400
  • 17
  • 144
  • 178