I've got two data frames, df1 and df2, which both contain 3 columns of genetic data. I am trying to figure out a way to concatenate (merge? multiply?) the data frames to get all possible combinations in a 6-col data frame.
For example, if df1 looks like this:
chr1 100 200
chr2 200 300
chr3 300 400
and df2 looks like this:
chr1 600 800
chr2 800 1000
I want the output to look like this:
chr1 100 200 chr1 600 800
chr1 100 200 chr2 800 1000
chr2 200 300 chr1 600 800
chr2 200 300 chr2 800 1000
chr3 300 400 chr1 600 800
chr3 300 400 chr2 800 1000
So basically, each 3-col row in df1 is combined with each 3-col row in df2. The logic works like this:
If df1 has values:
A
B
C
And df2 has values:
5
6
The output should be:
A 5
A 6
B 5
B 6
C 5
C 6
Except of course, each value (A, B, C, 5, or 6) has 3 pieces of information (3 cols). I've tried following these two posts merge two data frames with all combinations and combine two data frames with all possible combinations, but so far have been unsuccessful. I think reshaping with melt might work, but I wasn't able to reshape it back to the original format.
Also, this needs to work for two data frames with different lengths (number of rows).
I'll post the code I am working with below. Any suggestions would be so appreciated! Thank you!
# generate some data
start1 <- seq(105000, 200000, by=20000)
stop1 <- start1+2000
chrs <- c("chr1", "chr2", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8",
"chr9", "chr10", "chr11", "chr12", "chr13", "chr14", "chr15", "chr16",
"chr17", "chr18", "chr19", "chr20", "chr21", "chr22", "chrX")
x <- sort(rep(chrs, times=5))
df1 <- data.frame(V1=x, V2=rep(start1,times=23), V3=rep(stop1,times=23))
start2 <-seq(800000, 920000, by=25000)
stop2 <- start2+2000
df2<- data.frame(V1=x, V2=rep(start2,times=23), V3=rep(stop2, times=23))
# remove the last 10 entires from df2 to test ineuqal nrow functionality
df2 <- df2[1:105,]
# attempt at melt
df1.b <- melt(df1)
df2.b <- melt(df2)
df3 <- full_join(df1.b, df2.b)
df3 <- na.omit(df3)
# error here
df3.b <- dcast(df3 ~ V1 + V2 + V3 ~ variable)