0

I have two large tables, both of which have a matching column that looks like this:

> head(introns2$Name)
[1] "chr1:12058:12178" "chr1:12228:12612" "chr1:12698:12974" "chr1:12722:13220"
[5] "chr1:13053:13220" "chr1:13375:13452"

> head(sqtl2$cluster_pos)
[1] "chr1:259025:261550" "chr1:804222:807217" "chr1:804222:807217"
[4] "chr1:804222:807217" "chr1:804222:807217" "chr1:804222:807217"

Whenever I run the following command:

combined <- inner_join(sqtl2, introns2, by=c("cluster_pos"="Name"))

I get a combined table with 0 rows. So far, I have made sure that both columns are of identical type by setting introns2$Name to char type like so: introns2$Name <- sapply(introns2$Name, as.character), and I have tried using a non-dplyr-based way of doing this same thing: combined <- merge(x=sqtl2,y=introns3,by.x="cluster_pos", by.y="Name")

I am assuming that there are overlapping hits between these two tables, since they come from the same source and are each enormous in size:

> nrow(introns2)
[1] 357746
> nrow(sqtl2)
[1] 1537363

Is there anything that I am overlooking? Again, I just want to join the two tables together per row on the basis of matches found in these columns.

CelineDion
  • 906
  • 5
  • 21
  • 1
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. – MrFlick Nov 07 '19 at 16:31
  • 1
    Since R is a vectorized language, you can just call `as.character` on a vector instead of using `sapply` as a middleman. I wouldn't call 357000 rows enormous, and like @joran says, certainly wouldn't assume there are overlaps in identifiers made up of sequences of 10–12 numbers just based on that size – camille Nov 07 '19 at 16:44
  • 1
    A quick way of checking if you really have matches would be `sum(introns2$Name %in% sqtl2$cluster_pos)`. – MalditoBarbudo Nov 08 '19 at 18:56

0 Answers0