0

I have about 20,000 unique identifiers in a column in one dataframe. I want to filter the data from a second, larger dataframe (about 180,000 rows) so that only the rows with matching identifiers are left.

So, dataframe 1 looks something like this

    df1 <- c("identifierab1","identifier56gh","identifier4znvh2")

dataframe 2 has 72 columns but one of the columns also has identifiers

    df2$relevantcolumn <- c("identifierab1", "identifierab1", "identifier9472bh")

Ultimately, I want to filter df2 so that the only rows left are the rows in which df2$relevantcolumn matches the values in df1. I have tried %in% and matches but nothing seems to work..

Oliver
  • 274
  • 1
  • 11

2 Answers2

1

As mentioned in the comments there were whitespaces in the data hence it didn't match. We can use trimws to remove the whitespace and then try to subset it.

df2[trimws(df2$relevantcolumn) %in% trimws(df1), ]

Or if df1 is dataframe

df2[trimws(df2$relevantcolumn) %in% trimws(df1$relevant_column), ]
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
0

We can use grep

df2[grep(paste(df1, collapse="|"), df2$relevantcolumn),]
akrun
  • 874,273
  • 37
  • 540
  • 662