2

I'm using the duplicated function on a data.table object. It returns FALSE on two values that seem to be identical.

Looking deeper into them, it appears that they have a tiny difference (of -1.867777e-14, but it can be any other near-zero value).

For my needs, this is a bug. How would you fix it, without changing the values in the table?

  • 2
    you could `round()` your values. – mtoto Jun 13 '16 at 08:18
  • Yeah, that's what I'm thinking about, just wondering exactly how (not all of my columns are numeric and I'm not sure how precise I want to be). But it should work out. – Yonatan Lazar Telem Jun 13 '16 at 08:20
  • Do you mean some columns are character? Maybe you want duplicate based on fuzzy matching of strings, see: http://stackoverflow.com/questions/11535625 – zx8754 Jun 13 '16 at 08:45
  • The character columns should have an exact match. – Yonatan Lazar Telem Jun 13 '16 at 08:51
  • If you are looking for exact `tolerance` distance between the numbers to group, you can use self rolling join giving the `roll=tolerance` / `roll=-tolerance` and `rollends=TRUE`. Be aware that high floating point precision is platform specific, in any programming language. – jangorecki Jun 13 '16 at 09:37

1 Answers1

4

You may try round while using duplicated;

> x<-c(10.258963,10.258962)
> duplicated(x)
[1] FALSE FALSE
> duplicated(round(x,5))
[1] FALSE  TRUE
rar
  • 894
  • 1
  • 9
  • 24
  • Thanks. How would I do that if I need to check duplications on some values that are not numeric (and therefore can't be rounded) as well, together with my numerics? – Yonatan Lazar Telem Jun 13 '16 at 08:31
  • See this question for an example: http://stackoverflow.com/questions/13742446/duplicates-in-multiple-columns – Paul Hiemstra Jun 13 '16 at 08:36
  • 1
    So according to the example if I have `numeric_cols` and `non_numeric_cols` column names, I would use something like `duplicated(dt[, non_numeric_cols, with=FALSE], by=NULL) & duplicated(round(dt[, numeric_cols, with=FALSE], by=NULL), 5)` ? – Yonatan Lazar Telem Jun 13 '16 at 08:43