0

I've got two data frames, df1 and df2, which both contain 3 columns of genetic data. I am trying to figure out a way to concatenate (merge? multiply?) the data frames to get all possible combinations in a 6-col data frame.

For example, if df1 looks like this:

chr1 100 200 
chr2 200 300
chr3 300 400

and df2 looks like this:

chr1 600 800
chr2 800 1000

I want the output to look like this:

chr1 100 200 chr1 600 800
chr1 100 200 chr2 800 1000
chr2 200 300 chr1 600 800
chr2 200 300 chr2 800 1000
chr3 300 400 chr1 600 800
chr3 300 400 chr2 800 1000

So basically, each 3-col row in df1 is combined with each 3-col row in df2. The logic works like this:

If df1 has values:

A 
B
C

And df2 has values:

5
6

The output should be:

A 5
A 6
B 5
B 6
C 5
C 6

Except of course, each value (A, B, C, 5, or 6) has 3 pieces of information (3 cols). I've tried following these two posts merge two data frames with all combinations and combine two data frames with all possible combinations, but so far have been unsuccessful. I think reshaping with melt might work, but I wasn't able to reshape it back to the original format.

Also, this needs to work for two data frames with different lengths (number of rows).

I'll post the code I am working with below. Any suggestions would be so appreciated! Thank you!

# generate some data
start1 <- seq(105000, 200000, by=20000)
stop1 <- start1+2000
chrs <- c("chr1", "chr2", "chr3", "chr4", "chr5", "chr6", "chr7", "chr8", 
          "chr9", "chr10", "chr11", "chr12", "chr13", "chr14", "chr15", "chr16",
          "chr17", "chr18", "chr19", "chr20", "chr21", "chr22", "chrX")
x <- sort(rep(chrs, times=5))

df1 <- data.frame(V1=x, V2=rep(start1,times=23), V3=rep(stop1,times=23))

start2 <-seq(800000, 920000, by=25000)
stop2 <- start2+2000

df2<- data.frame(V1=x, V2=rep(start2,times=23), V3=rep(stop2, times=23))

# remove the last 10 entires from df2 to test ineuqal nrow functionality
df2 <- df2[1:105,]

# attempt at melt
df1.b <- melt(df1)
df2.b <- melt(df2)

df3 <- full_join(df1.b, df2.b)

df3 <- na.omit(df3)

# error here
df3.b <- dcast(df3 ~ V1 + V2 + V3 ~ variable)
Doda
  • 285
  • 1
  • 9
  • Try `full_join(df1, df2, by = character())`. If you just use `full_join` it will look for common column names and join with those, so you won't get combinations where V1 (or V2 or V3) are different between the two data frames. `by = character()` tells dplyr to join every row of df1 to every row of df2, since the key column is identical for all data. – Jon Spring Jan 25 '23 at 23:10

0 Answers0