1

I want to intersect my data-frame based on two columns single column i can do that using the intersect function but how to go about two columns.

Here is my sample data-frame

head(Region)
          ENSEMBL UP_DOWN
1 ENSG00000000457      UP
2 ENSG00000000460      UP
3 ENSG00000000938      UP
4 ENSG00000000971      UP
5 ENSG00000001084    DOWN
6 ENSG00000001460      UP

The second data-frame

head(gene)
          ENSEMBL UP_DOWN
1 ENSG00000000003    DOWN
2 ENSG00000000938      UP
3 ENSG00000001630    DOWN
4 ENSG00000002822    DOWN
5 ENSG00000004059    DOWN
6 ENSG00000004139    DOWN

So far what im doing is this

c <- as.data.frame(intersect(Region$ENSEMBL,gene$ENSEMBL))

But I lose the information if that respective row is either "UP" or "DOWNN" in either of my data-frame. How do i label that? information

PesKchan
  • 868
  • 6
  • 14
  • 1
    The options in this link should work as well - https://stackoverflow.com/questions/32917934/how-to-find-common-rows-between-two-dataframe-in-r – Ronak Shah Jun 02 '21 at 06:38

2 Answers2

2

You could do an inner join:

library(dplyr)

inner_join(Region, gene, by = c('ENSEMBL','UP_DOWN'))

          ENSEMBL UP_DOWN
1 ENSG00000000938      UP

Waldi
  • 39,242
  • 6
  • 30
  • 78
  • but would it tell me if the "UP" in both cases like both my dataframe is "UP" ? – PesKchan Jun 02 '21 at 06:03
  • 1
    checked few of them so it matches but now if one row is UP in one data-frame and "DOWN" in the other then it wont come in the output.. – PesKchan Jun 02 '21 at 06:06
  • 1
    You could just join by 'ENSEMBL' only : you'll see both sides. I joined by `c('ENSEMBL','UP_DOWN')` because you asked for an intersect – Waldi Jun 02 '21 at 06:26
  • 1
    We can use suffix argument to name the repeated columns: inner_join(Region, gene, by = c('ENSEMBL'), suffix = c('_Region', '_Gene')) – jpdugo17 Jun 02 '21 at 06:32
2

A base R option with merge may help

> merge(Region, gene)
          ENSEMBL UP_DOWN
1 ENSG00000000938      UP
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81