Let's suppose I have a data frame with the following parameters
DATA <- data.frame(ROWID, ID1, NAME1, ...IDn, NAMEn)
Sample of what the data might look like:
ROWID | ID1 | NAME1 | ID2 | NAME2 | IDn | NAMEn
001 | 001 | FAS | 002 | MAS | 999 | ZOO
002 | 003 | BIN | 004 | DUN | 998 | SOO
Where I have 201 columns by 10k+ rows. What I would like to do is to reshape this data such that for each row in the original DATA, I produce a set of rows in a subsequent data frame. Each row would consist of the originating ROWID, IDa, NAMEa, IDb, NAMEb pairs such that the first is matched with all others (99 pairs containing ID1, 98 with ID2, and so on). This would occur for each row producing a large data frame of all possible combinations within rows for every row. The result would look like:
ROWID1 | ID1 | NAME1 | ID2 | NAME2
ROWID1 | ID1 | NAME1 | ID3 | NAME3
...
ROWID1 | ID2 | NAME2 | ID3 | NAME3
...
ROWID2 | ID1 | NAME1 | ID2 | NAME2
ROWID2 | ID1 | NAME1 | ID3 | NAME3
...
The code I produced to do this is as follows. It works great, but only on smaller data frames. The full data frame is painfully slow, and I am hoping to have alternatives to speed it up using functions or something else of which I am unaware. Thanks in advance!!
DATA <- data.frame(as described above)
META <- data.frame(ROWID=numeric(0),ID1=numeric(0),
BUS1=character(0),ID2=numeric(0),BUS2=character(0))
for (i in 1:length(DATA$ROWID)) {
SET <- data.frame(ROWID=numeric(0),ID1=numeric(0),
BUS1=character(0),ID2=numeric(0),BUS2=character(0))
ROWID <- DATA[i,1]
for (x in seq(3,ncol(DATA),2)) {
for (y in seq(x,ncol(DATA),2)) {
ID1 <- DATA[i,x-2]
BUS1 <- DATA[i,x]
ID2 <- DATA[i,y-2]
BUS2 <- DATA[i,y]
if (!is.na(BUS1) && !is.na(BUS2)) {
NEW <- cbind(ROWID, ID1, BUS1, ID2, BUS2)
SET <- rbind(SET, NEW)
}
}
}
META <- rbind(META, SET)
}