0

I would like to detect all duplicates of coordinates of an object, which means that I want to keep one of every kind, but mark all the other ones. If I try:

for(bb in 1:nrow(Cons)){
  for(cc in 1:nrow(Cons)){
    if(identical(Cons$lat[bb], Cons$lat[cc])&& identical(Cons$lng[bb], Cons$lng[cc])&& !identical(bb,cc)){
      Cons$X[bb] <- NA
    }
  }
}

Then I get every pair of coordinates marked. Any ideas how I can keep the first one?

Laura94
  • 1
  • 1

3 Answers3

0

As the comments noted this can be done with the duplicated function. Here is a reproducible example and solution:

# generate some data
# set.seed to make reproducible
set.seed(10)
lat.df <- data.frame(item=1:100, 
    lat=round(runif(100),1), 
    lon=round(runif(100),1))

    head(lat.df)
   item lat lon
1    1 0.5 0.3
2    2 0.3 0.1
3    3 0.4 0.4
4    4 0.7 0.4
5    5 0.1 0.9
6    6 0.2 0.9

If we call the duplicated function on the lat/lon columns it will give us a vector of TRUE/FALSE corresponding to whether the row has a duplicate. We can use this TRUE/FALSE vector to subset our dataframe to only the unique entries:

lat.df.unique <- lat.df[!duplicated(lat.df[,c("lat","lon")]),]

# Check dimensions of unique data.frame to see if we have removed dups
dim(lat.df.unique)
[1] 69  3
gfgm
  • 3,627
  • 14
  • 34
0

In dplyr, you can do this:

library(dyplr)
Cons %>% distinct(lat, lng)
s-heins
  • 679
  • 1
  • 8
  • 20
0

This is a pretty easy way to do it.

library(data.table)
setDT(Cons)
Cons = unique(Cons, by = c("lat","lng"))
Kristofersen
  • 2,736
  • 1
  • 15
  • 31