Subsetting based on two columns from two dataframes where the column names are different

Question

I have two data frames,

df1,
  chr     start       end     
3676  chr1  793962  95298066  
2913 chr20  13200929  13200929

and

df2
               chr         pos      strand      fit
cg15903280        chr1    793962           - -0.42120400
cg16619049        chr1    805541           + -0.05317789

and I needed to combine those two data frames based on the two columns, chr and start from df1 and chr and pos from df2. And I tried using subset as,

head(subset(df, chr %in% df2$chr & start %in% wanted_cgs$pos) )

But I dont get all columns from df2 in the resulting rows. Any suggestions would be great.

In the end, I need the row names from df2 to be appended to the resulting data frame. Like this,

               chr       pos      strand      fitted                     
cg15903280        chr1    793962           - -0.42120400
cg16619049        chr1    805541           + -0.05317789

Please show the result you expect (I see no matches between the example data frames given). From `dplyr` try something like `inner_join(df1, df2, by =c("chr" = "chr", "start" = "pos"))` — Andrew Lavers, Apr 25 '17 at 11:28
In addition to the good suggestions above: I would not use row names to carry information, but first add a column `df2$id = rownames(df2)`, and then do the merging. — juod, Apr 25 '17 at 11:30
I'm not sure but from the context I'm going to guess you are actually looking for `left_join(df2,df1, by="chr") %>% filter(between(pos, start, end))` — Stephen Henderson, Apr 25 '17 at 11:32
@StephenHenderson that would make "almost" cross join, OP wants to join on chr and position, then there would be no need for `filter` step. — zx8754, Apr 25 '17 at 11:34
@zx8754 Maybe? But their example data doesn't make sense that way. — Stephen Henderson, Apr 25 '17 at 11:38
@StephenHenderson it does, there is a typo in the example. See expected output, they are matching: `chr1 793962` to `chr1 793962 - -0.42120400` — zx8754, Apr 25 '17 at 11:41
@StephenHenderson, actually It may work when I change the column name. But not with current columns names. As you can see each dataframe has different column names — ARJ, Apr 25 '17 at 11:43
@user1017373 joins have `by` and `by.x` and `by.y` arguments, column names do not have to match. Also, in **df1** I think `chr5` should be `chr1` in your example. — zx8754, Apr 25 '17 at 11:44
@zx8754 yes you are right. But it's still throwing error with inner_join as, Error: x and y don't share the same src. Set copy = TRUE to copy y into x's source (this may be time consuming). — ARJ, Apr 25 '17 at 11:46
Relevant post:http://stackoverflow.com/questions/24480031/roll-join-with-start-end-window — zx8754, Apr 25 '17 at 11:48

Subsetting based on two columns from two dataframes where the column names are different

0 Answers0