2

I have a data.frame like this:

x1 <- data.frame(id=1:3,item=c("A","B","A","B","C","D"))
x1[order(x1$item),]
  id item
1  1    A
3  3    A
2  2    B
4  1    B
5  2    C
6  3    D

I want to get :

id1=c(1,2,1,3,2,3)
id2 = c(2,1,3,1,3,2)
A=c(0,0,1,1,0,0)
B=c(1,1,0,0,0,0)
C = 0
D=0
datawanted <- data.frame(id1,id2,A,B,C,D)
  id1 id2 A B C D
1   1   2 0 1 0 0
2   2   1 0 1 0 0
3   1   3 1 0 0 0
4   3   1 1 0 0 0
5   2   3 0 0 0 0
6   3   2 0 0 0 0

if person1 and person2 both have B,then in the datawanted dataframe,column A ,got 1,else get 0.

Can someone give me some suggestions or functions in R,to deal with this problem?

Frank
  • 66,179
  • 8
  • 96
  • 180
chunjin
  • 240
  • 1
  • 8

1 Answers1

4

Cool question. You have a bipartite graph, so following Gabor's tutorial...

library(igraph)
g = graph_from_edgelist(as.matrix(x1))
V(g)$type = grepl("[A-Z]", V(g)$name)

For OP's desired output, first we can extract the incidence matrix:

gi = get.incidence(g)
#   A B C D
# 1 1 1 0 0
# 2 0 1 1 0
# 3 1 0 0 1

Note (thanks @thelatemail), that if you don't want to use igraph, you can get to gi as table(x1).

Then, we look at the combinations of ids:

res = t(combn(nrow(gi), 2, function(x) c(
    as.integer(rownames(gi)[x]), 
    pmin( gi[x[1], ], gi[x[2], ] ) 
)))

dimnames(res) <- list( NULL, c("id1", "id2", colnames(gi)))
#      id1 id2 A B C D
# [1,]   1   2 0 1 0 0
# [2,]   1   3 1 0 0 0
# [3,]   2   3 0 0 0 0

This essentially is the OP's desired output. They had included redundant rows (e.g., 1,2 and 2,1).


Fun reason to use a graph (ht Chris):

V(g)$color <- ifelse(V(g)$type, "red", "light blue")
V(g)$x     <- (1:2)[ V(g)$type + 1 ]
V(g)$y     <- ave(seq_along(V(g)), V(g)$type, FUN = seq_along)
plot(g)

enter image description here

Or, apparently this can be done more or less like

plot(g, layout = layout.bipartite(g)[,2:1])
Community
  • 1
  • 1
Frank
  • 66,179
  • 8
  • 96
  • 180
  • 1
    Isn't the first part of this just `table(x1)` ? – thelatemail Aug 05 '16 at 03:42
  • @thelatemail Sure, but it's a graph, so might as well store it as one. If the OP isn't done with their analysis after this, they might take advantage of whatever other tools igraph has (... though I'm not that familiar with them myself). Good point, though, I've edited to reflect it. – Frank Aug 05 '16 at 03:43
  • 1
    thanks,if id changes like c(1,3,4),and the method you give may cause subscript outbounding. `combn` and pmin did give me some help.I need to think about if the id is not corrospended with rownumbers,and how can it work? – chunjin Aug 05 '16 at 07:56
  • 1
    I use the code like this ,should this cause any question? `res[,1] <- x1$id[res[,1]]` `res[,2] <- x1$id[res[,2]]` – chunjin Aug 05 '16 at 08:11
  • @chunjin Good catch. Yes, I think your way works. I've also edited to show a different way above, changing the construction of `res` to use `as.integer(rownames(gi)[x])` instead of `x`. – Frank Aug 06 '16 at 13:15