0

I have two data frames, I need to find the rows in second data frame which are newly added that means my First data frame has some rows and my second data frame can have few rows from my First data frame and some other rows also. I need to find those rows which are not in first data frame. That means rows which are only in my second data frame.

Below is the example with output

comp1<- data.frame(sector =c('Sector_123','Sector_456','Sector_789','Sector_101','Sector_111','Sector_113','Sector_115','Sector_117'), id=c(1,2,3,4,5,6,7,8) ,stringsAsFactors = FALSE)

comp2 <- data.frame(sector = c('Sector_456','Sector_789','Sector_000','Sector_222'), id=c(2,3,6,5),  stringsAsFactors = FALSE)

Expected output is should be like below:

sector             id
Sector_000          6
Sector_222          5

I should not use any other libraries like compare and data.table. any suggestions

Hmm
  • 105
  • 10
  • 2
    Does this answer your question? [Find complement of a data frame (anti - join)](https://stackoverflow.com/questions/28702960/find-complement-of-a-data-frame-anti-join) – Paul Jun 22 '20 at 12:24

1 Answers1

0

Assuming we are looking for similar entries in column sector. For all columns just remove the restriction.

We could use dplyr:

anti_join(comp2, comp1, by="sector")

gives us

> anti_join(comp2, comp1, by="sector")
      sector id
1 Sector_000  6
2 Sector_222  5

With base R we could use

comp2[!comp2$sector %in% comp1$sector,]
Martin Gal
  • 16,640
  • 5
  • 21
  • 39