7

I want to remove data from a dataframe that is present in another dataframe. Let me give an example:

letters<-c('a','b','c','d','e')
numbers<-c(1,2,3,4,5)
list_one<-data.frame(letters,numbers)

I want to remove every row in list_one with matches in letters to this other dataframe:

letters2<-c('a','c','d')
list_two<-data.frame(letters2)

I should mention that I'm actually trying to do this with two large csv files, so I really can't use the negative expression - to take out the rows.

And create a final dataframe which only has the letters b and e and their corresponding numbers. How do I do this?

I'm new to R so it's hard to research questions when I'm not quite sure what key terms to search. Any help is appreciated, thanks!

kevluv93
  • 85
  • 1
  • 1
  • 5

3 Answers3

9

A dplyr solution

library(dplyr)

list_one %>% anti_join(list_two)
bramtayl
  • 4,004
  • 2
  • 11
  • 18
5

Base R Solution

list_one[!list_one$letters %in% list_two$letters2,]

gives you:

  letters numbers
2       b       2
5       e       5

Explanation:

> list_one$letters %in% list_two$letters2
[1]  TRUE FALSE  TRUE  TRUE FALSE

This gives you a vector of LENGTH == length(list_one$letters) with TRUE/FALSE Values. ! negates this vector. So you end up with FALSE/TRUE values if the value is present in list_two$letters2.

If you have questions about how to select rows from a data.frame enter

?`[.data.frame`

to the console and read it.

Rentrop
  • 20,979
  • 10
  • 72
  • 100
1

Answer is response to your edit: " so I really can't use the negative expression".

I guess one of the most efficient ways to do this is using data.table as follows:

require(data.table)
setDT(list_one)
setDT(list_two)
list_one[!list_two, on=c(letters = "letters2")]

Or

require(data.table)
setDT(list_one, key = "letters")
setDT(list_two, key = "letters2")
list_one[!letters2]

(Thanks to Frank for the improvement)

Result:

   letters numbers
1:       b       2
2:       e       5

Have a look at ?"data.table" and Quickly reading very large tables as dataframes in R on why to use data.table::freadto read the csv-files in the first place.

BTW: If you have letters2 instead of list_two you can use

list_one[!J(letters2)]
Community
  • 1
  • 1
Rentrop
  • 20,979
  • 10
  • 72
  • 100