0

I have a data frame that looks like following:

> con.hull.mod
           x         y
1  2.5558145 4.1739617
2  5.0180096 5.4267733
3  5.0180096 5.4267733
4  6.2151346 6.0358932
5  6.3582981 6.1087375
6  6.3582053 5.8702711
7  6.3574907 4.0355980
8  6.3574907 4.0355980
9  6.3565247 1.5554874
10 6.3560029 0.2155812
11 4.0490978 0.6009829
12 0.9284811 3.3459437

Clearly row numbers 7 and 8 are identical. I am using the function unique() to get the non-repeating rows. But it seems that the function fails for real numbers (or am I mistaken in some way?).

> unique(con.hull.mod)
           x         y
1  2.5558145 4.1739617
2  5.0180096 5.4267733
3  5.0180096 5.4267733
4  6.2151346 6.0358932
5  6.3582981 6.1087375
6  6.3582053 5.8702711
7  6.3574907 4.0355980
8  6.3574907 4.0355980
9  6.3565247 1.5554874
10 6.3560029 0.2155812
11 4.0490978 0.6009829
12 0.9284811 3.3459437

How to avoid this problem? Thanks in advance for your help :)

  • What class is your data in, data frame? If so, try `distinct()` – Chamkrai Jul 22 '22 at 17:56
  • 1
    It worked fine for me. Can you please give us _exactly_ what you have by running `dput(con.hull.mod)` and pasting the output into your question. – G5W Jul 22 '22 at 17:57
  • 1
    Actually, clearly the *representation* of rows 7 and 8 look the same, but that's just that. If you set `options(digits=0)`, then "clearly many more rows "look" identical, but they are obviously not. – r2evans Jul 22 '22 at 17:57
  • If this is not a duplicate of https://stackoverflow.com/q/9508518/3358272 then I believe it is heavily informed by the premise that `(0.1 + 0.05) == 0.15` counter-intuitively returning `FALSE` because of how all programming languages store floating-point numbers. See https://cran.r-project.org/doc/FAQ/R-FAQ.html#Why-doesn_0027t-R-think-these-numbers-are-equal_003f. – r2evans Jul 22 '22 at 17:59
  • 1
    If you won't provide the output of `dput(con.hull.mod)`, then we cannot know for sure since all we have are 7 decimal digits in the data, and we're going to find duplication where it likely does not exist in yours. Try `options(digits=22); con.hull.mod` and look again at rows 7 and 8 to see if you see a difference, though even that is imperfect. Try `abs(sapply(con.hull.mod[7:8,], diff)) > 0` and I'm guessing you'll see `TRUE` for at least one of the values, indicating that there is some microscopic difference between them. – r2evans Jul 22 '22 at 18:01
  • You are correct. There are indeed some microscopic differences when I do with `options(digits=22)`, as the data frame is an output from my code. However, the two points are basically the same coordinate that I am working with, and I want to get rid of the duplicate ones. How can I achieve this? Is there a way in which I can truncate the digits in my data frame `con.hull.mod`? – Pratik Mullick Jul 22 '22 at 22:01

1 Answers1

0

unique works with no problem

unique(con.hull.mod)

  • output
           x         y
1  2.5558145 4.1739617
2  5.0180096 5.4267733
4  6.2151346 6.0358932
5  6.3582981 6.1087375
6  6.3582053 5.8702711
7  6.3574907 4.0355980
9  6.3565247 1.5554874
10 6.3560029 0.2155812
11 4.0490978 0.6009829
12 0.9284811 3.3459437
  • data
con.hull.mod <- structure(list(x = c(2.5558145, 5.0180096, 5.0180096, 6.2151346, 
6.3582981, 6.3582053, 6.3574907, 6.3574907, 6.3565247, 6.3560029, 
4.0490978, 0.9284811), y = c(4.1739617, 5.4267733, 5.4267733, 
6.0358932, 6.1087375, 5.8702711, 4.035598, 4.035598, 1.5554874, 
0.2155812, 0.6009829, 3.3459437)), class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12"))
Mohamed Desouky
  • 4,340
  • 2
  • 4
  • 19