How to find common rows between two dataframe in R?

Question

I would like to make a new data frame which only includes common rows of two separate data.frame. example:

data.frame 1

1 id300
2 id2345
3 id5456
4 id33
5 id45
6 id54

data.frame2

1 id832
2 id300
3 id1000
4 id45
5 id984
6 id5456
7 id888

So I want my output be:

1 id300
2 id45
3 id5456

any suggestion please?

score 39 · Answer 1 · answered Apr 10 '17 at 23:05

39

The appropriate dplyr function here is inner_join (returns all rows from df x that have a match in df y.)

library(dplyr)
inner_join(df1, df2)

      V1
1  id300
2 id5456
3   id45

Note: the rows are returned in the order in which they are in df1. If you did inner_join(df2, df1), id45 would come before id5456.

answered Apr 10 '17 at 23:05

Joe

8,073
1
52
58

Is it possible to delete the intersection from df1? – Pharcyde Apr 07 '20 at 15:59
2

Yes, I think you would use `anti_join(df1, df2)`. Check out the function documentation for the specifics. – Joe Apr 07 '20 at 20:00

Navin Manaswi · Accepted Answer · 2015-10-03T02:32:44.010

29

common <- intersect(data.frame1$col, data.frame2$col)  
data.frame1[common,] # give you common rows in data frame 1  
data.frame2[common,] # give you common rows in data frame 2

edited Oct 03 '15 at 02:32

answered Oct 03 '15 at 01:19

Navin Manaswi

964
7
19

1

I also like `dplyr`s intersect (seperate package) – Helix123 Oct 03 '15 at 08:56
6

I think this answer is not answering the question correctly. It does not even generalize to data.frames with two or more columns. – raymkchow Jul 20 '17 at 08:12
Agree with @raymkchow. Works fine for the specific example where there's only one column per dataframe, but would be nice to have a more general solution – jruf003 Mar 28 '19 at 23:57
Beware of `intersect` ! If you happen to have a certain value show up twice in both dataframes, you will only get one of those rows returned. You may or may not want this effect. – Carl Witthoft Aug 24 '20 at 17:56

R. Schifini · Answer 3 · 2015-10-03T02:10:47.490

Use merge

new_data_frame <- merge(data.frame1, data.frame2)

I'm assuming you have only one column in each data frame and they have the same name in both frames. If not use the column you want to intersect by with by.x = "nameCol1" and by.y = "nameCol2", where nameCol are the real column names.

Added after first comment
If you have more columns in any data frame the command is the same. Do it this way:

>a  #Data frame 1
      c1 c2
1  id300  6
2 id2345  5
3 id5456  4
4   id33  3
5   id45  2
6   id54  1

> b #Data frame 2
     a  f
1  asd 12
2 id33 10
3 id45  8
4 id54  6

As you may see, they don't share column names and have 2 columns each. So:

> merge(a,b, by.x = "c1", by.y = "a")

    c1 c2  f
1 id33  3 10
2 id45  2  8
3 id54  1  6

The only rows that are left are those that have the same entries in common in the selected columns.

if I have more than one column in secound data frame and if I want them to show up in new data frame then how should I change this command? — zara, Oct 03 '15 at 01:37

score 5 · Answer 4 · answered Jun 09 '17 at 22:27

5

We can also do this with fintersect from data.table after converting the data.frame to data.table

library(data.table)
fintersect(setDT(df1), setDT(df2))
#       v1
#1:  id300
#2:   id45
#3: id5456

data

df1 <- structure(list(v1 = c("id300", "id2345", "id5456", "id33", "id45", 
"id54")), .Names = "v1", class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6"))

df2 <- structure(list(v1 = c("id832", "id300", "id1000", "id45", "id984", 
"id5456", "id888")), .Names = "v1", class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7"))

answered Jun 09 '17 at 22:27

akrun

874,273
37
540
662

how to do the same with multiple columns in a single dataframe ? I have 37 columns which are unequal in length ... – kcm Sep 15 '20 at 09:31
@krushnachChandra Not clear about your issue. Do you want to get the intersect from a single data.frame. then do `Reduce(intersect, df1)` – akrun Sep 15 '20 at 16:30

score 3 · Answer 5 · edited Mar 27 '18 at 14:48

3

To achieve this, you should assign the row names in both data frame and then process with intersect in R. This can be achieved with the following command:

intersect(dataframe.1$V1,dataframe.2$V2)

edited Mar 27 '18 at 14:48

sparkitny

1,503
4
18
23

answered Mar 27 '18 at 14:01

Bioifo

41
5

score 0 · Answer 6 · answered Jan 05 '21 at 10:07

Use intersect function of dplyr package

install this package
use this function like

var_name <- intersect(Name of 1st dataset, Name of 2nd dataset)

The best thing about this is it can work with character also while merge and inner_join cannot able to give results also you can use this function to see where is their location on your dataset

for (i in (no. of duplicates ) ) {
 for (j in ( no. of rows in other dataset  ) ) {
  if (new[i] == ac$variable_name[j]) {
  print(j)
  }
  } 
}

How to find common rows between two dataframe in R?

6 Answers6

data

Linked