8

I want to Identify the rows present in dataframe1 which are not present in dataframe2 based on a particular column. I have used the below code to get the desired information.

diffId <- anti_join(dat$ID,datwe$ID)

Unfortunately, I have encountered with an error:

Error in UseMethod("anti_join") :
no applicable method for 'anti_join' applied to an object of class "factor"

I have checked the class of the desired column in both the dataframes and which turned out to be factor. Have also tried to separate the column into a separate variable in an assumption that it might solve the issue, but of no luck !

fac1 <- datwe$ID
fac2 <- dat$ID
diffId <- anti_join(fac2,fac1)

Could you please share your thoughts ?

Thanks

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Prradep
  • 5,506
  • 5
  • 43
  • 84

1 Answers1

5

Almost all dplyr functions operate on tbls (depending on the context it can be data.frame, data.table, database connection and so on) so what you really want is something like this:

> dat <- data.frame(ID=c(1, 3, 6, 4), x=runif(4))
> datwe <- data.frame(ID=c(3, 5, 8), y=runif(3))
> anti_join(dat, datwe, by='ID') %>% select(ID)
  ID
1  4
2  6
3  1

Note that ordering is clearly not preserved.

If you use factors (unlike numerics in the example above) with different levels there is a conversion between factor and character involved.

If you want to operate on vectors then you can use setdiff (available in both base and dplyr)

> setdiff(dat$ID, datwe$ID)
[1] 1 6 4
Matt Bannert
  • 27,631
  • 38
  • 141
  • 207
zero323
  • 322,348
  • 103
  • 959
  • 935
  • Thanks @zero323. It worked with the above suggestion. (I will keep this thread open for a while to learn other methods of doing this task and mark this one as answer) Could you be kind to explain why it didn't work when there was only one column ? – Prradep Jun 04 '15 at 08:32
  • It works perfectly if there is only one column. Problem is `dat$ID` is no longer a `data.frame` but a vector. – zero323 Jun 04 '15 at 08:33
  • Oh, `anti_join` can be used only for `data.frame`. Thanks ! – Prradep Jun 04 '15 at 08:34
  • To be precise on the objects that can be coerced to `dplyr::tbl`. – zero323 Jun 04 '15 at 08:36