0

I have two data frames,

df1,
  chr     start       end     
3676  chr1  793962  95298066  
2913 chr20  13200929  13200929 

and

df2
               chr         pos      strand      fit
cg15903280        chr1    793962           - -0.42120400
cg16619049        chr1    805541           + -0.05317789

and I needed to combine those two data frames based on the two columns, chr and start from df1 and chr and pos from df2. And I tried using subset as,

head(subset(df, chr %in% df2$chr & start %in% wanted_cgs$pos) )

But I dont get all columns from df2 in the resulting rows. Any suggestions would be great.

In the end, I need the row names from df2 to be appended to the resulting data frame. Like this,

               chr       pos      strand      fitted                     
cg15903280        chr1    793962           - -0.42120400
cg16619049        chr1    805541           + -0.05317789
ARJ
  • 2,021
  • 4
  • 27
  • 52
  • Please show the result you expect (I see no matches between the example data frames given). From `dplyr` try something like `inner_join(df1, df2, by =c("chr" = "chr", "start" = "pos"))` – Andrew Lavers Apr 25 '17 at 11:28
  • 1
    In addition to the good suggestions above: I would not use row names to carry information, but first add a column `df2$id = rownames(df2)`, and then do the merging. – juod Apr 25 '17 at 11:30
  • I'm not sure but from the context I'm going to guess you are actually looking for `left_join(df2,df1, by="chr") %>% filter(between(pos, start, end))` – Stephen Henderson Apr 25 '17 at 11:32
  • @StephenHenderson that would make "almost" cross join, OP wants to join on chr and position, then there would be no need for `filter` step. – zx8754 Apr 25 '17 at 11:34
  • @zx8754 Maybe? But their example data doesn't make sense that way. – Stephen Henderson Apr 25 '17 at 11:38
  • @StephenHenderson it does, there is a typo in the example. See expected output, they are matching: `chr1 793962` to `chr1 793962 - -0.42120400` – zx8754 Apr 25 '17 at 11:41
  • @StephenHenderson, actually It may work when I change the column name. But not with current columns names. As you can see each dataframe has different column names – ARJ Apr 25 '17 at 11:43
  • @user1017373 joins have `by` and `by.x` and `by.y` arguments, column names do not have to match. Also, in **df1** I think `chr5` should be `chr1` in your example. – zx8754 Apr 25 '17 at 11:44
  • @zx8754 yes you are right. But it's still throwing error with inner_join as, Error: x and y don't share the same src. Set copy = TRUE to copy y into x's source (this may be time consuming). – ARJ Apr 25 '17 at 11:46
  • Relevant post:http://stackoverflow.com/questions/24480031/roll-join-with-start-end-window – zx8754 Apr 25 '17 at 11:48
  • Make the data and error reproducible. – zx8754 Apr 25 '17 at 11:48

0 Answers0