The function unique in R is not working for real numbers

Question

I have a data frame that looks like following:

> con.hull.mod
           x         y
1  2.5558145 4.1739617
2  5.0180096 5.4267733
3  5.0180096 5.4267733
4  6.2151346 6.0358932
5  6.3582981 6.1087375
6  6.3582053 5.8702711
7  6.3574907 4.0355980
8  6.3574907 4.0355980
9  6.3565247 1.5554874
10 6.3560029 0.2155812
11 4.0490978 0.6009829
12 0.9284811 3.3459437

Clearly row numbers 7 and 8 are identical. I am using the function unique() to get the non-repeating rows. But it seems that the function fails for real numbers (or am I mistaken in some way?).

> unique(con.hull.mod)
           x         y
1  2.5558145 4.1739617
2  5.0180096 5.4267733
3  5.0180096 5.4267733
4  6.2151346 6.0358932
5  6.3582981 6.1087375
6  6.3582053 5.8702711
7  6.3574907 4.0355980
8  6.3574907 4.0355980
9  6.3565247 1.5554874
10 6.3560029 0.2155812
11 4.0490978 0.6009829
12 0.9284811 3.3459437

How to avoid this problem? Thanks in advance for your help :)

What class is your data in, data frame? If so, try `distinct()` — Chamkrai, Jul 22 '22 at 17:56
It worked fine for me. Can you please give us _exactly_ what you have by running `dput(con.hull.mod)` and pasting the output into your question. — G5W, Jul 22 '22 at 17:57
Actually, clearly the *representation* of rows 7 and 8 look the same, but that's just that. If you set `options(digits=0)`, then "clearly many more rows "look" identical, but they are obviously not. — r2evans, Jul 22 '22 at 17:57
If this is not a duplicate of https://stackoverflow.com/q/9508518/3358272 then I believe it is heavily informed by the premise that `(0.1 + 0.05) == 0.15` counter-intuitively returning `FALSE` because of how all programming languages store floating-point numbers. See https://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f. — r2evans, Jul 22 '22 at 17:59
If you won't provide the output of `dput(con.hull.mod)`, then we cannot know for sure since all we have are 7 decimal digits in the data, and we're going to find duplication where it likely does not exist in yours. Try `options(digits=22); con.hull.mod` and look again at rows 7 and 8 to see if you see a difference, though even that is imperfect. Try `abs(sapply(con.hull.mod[7:8,], diff)) > 0` and I'm guessing you'll see `TRUE` for at least one of the values, indicating that there is some microscopic difference between them. — r2evans, Jul 22 '22 at 18:01
You are correct. There are indeed some microscopic differences when I do with `options(digits=22)`, as the data frame is an output from my code. However, the two points are basically the same coordinate that I am working with, and I want to get rid of the duplicate ones. How can I achieve this? Is there a way in which I can truncate the digits in my data frame `con.hull.mod`? — Pratik Mullick, Jul 22 '22 at 22:01

score 0 · Answer 1 · answered Jul 22 '22 at 17:59

unique works with no problem

unique(con.hull.mod)

output

           x         y
1  2.5558145 4.1739617
2  5.0180096 5.4267733
4  6.2151346 6.0358932
5  6.3582981 6.1087375
6  6.3582053 5.8702711
7  6.3574907 4.0355980
9  6.3565247 1.5554874
10 6.3560029 0.2155812
11 4.0490978 0.6009829
12 0.9284811 3.3459437

data

con.hull.mod <- structure(list(x = c(2.5558145, 5.0180096, 5.0180096, 6.2151346, 
6.3582981, 6.3582053, 6.3574907, 6.3574907, 6.3565247, 6.3560029, 
4.0490978, 0.9284811), y = c(4.1739617, 5.4267733, 5.4267733, 
6.0358932, 6.1087375, 5.8702711, 4.035598, 4.035598, 1.5554874, 
0.2155812, 0.6009829, 3.3459437)), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12"))

The function unique in R is not working for real numbers

1 Answers1