If 2 columns of data have the same values in the row above it, then add 0.00001 to it

Question

I have a data frame that contains a column for Latitude and a column for Longitude that looks like this

test <- data.frame("Latitude" = c(45.14565, 45.14565, 45.14565, 45.14565, 33.2222, 
31.22122, 31.22122), "Longitude" = c(-105.6666, -105.6666, -105.6666, -104.3333, 
-104.3333, -105.77777, -105.77777))

I would like to make every value go out to 5 decimal places and also check to see that if a latitude and longitude pair are the same as the pair above it, to add 0.00001 to both the latitude and the longitude value. So my data would change to this:

test_updated <- data.frame("Latitude" = c(45.14565, 45.14566, 45.14567, 45.14565, 
33.22220, 31.22122, 31.22123), "Longitude" = c(-105.66660, -105.66661, -105.66662, 
-104.33330, -104.33330, -105.77777, -105.77778))

Related: [Increment by one to each duplicate value in R](https://stackoverflow.com/questions/43196718/increment-by-one-to-each-duplicate-value-in-r) — Henrik, Aug 03 '21 at 16:03

Uwe · Answer 1 · 2021-08-03T17:40:50.393

Here is an approach which updates the Latitude column in test to reproduce OP's expected result:

options(digits = 8) # required to print all significant digits of Longitude
library(data.table)
setDT(test)[, `:=`(Latitude  = Latitude  + (seq(.N) - 1) * 0.00001,
                   Longitude = Longitude + (seq(.N) - 1) * 0.00001), 
            by = .(Latitude, Longitude)]
test

   Latitude  Longitude
1: 45.14565 -105.66660
2: 45.14566 -105.66659
3: 45.14567 -105.66658
4: 45.14565 -104.33330
5: 33.22220 -104.33330
6: 31.22122 -105.77777
7: 31.22123 -105.77776

For comparison

test_updated

  Latitude  Longitude
1 45.14565 -105.66660
2 45.14566 -105.66661
3 45.14567 -105.66662
4 45.14565 -104.33330
5 33.22220 -104.33330
6 31.22122 -105.77777
7 31.22123 -105.77778

The discrepancy is caused by OP's requirement to add 0.00001 to both the latitude and the longitude value and OP's expected result where 0.00001 have been subtracted from the negative longitude values.

Edit

In order to reproduce the expected result, the sign of the value has to be considered. Unfortunately the base R sign() function returns zero for sign(0). So, we use fifelse(x < 0, -1, 1) instead.

In addition, we can pick up Henrik's splendid idea to use the rowid() function to avoid grouping.

options(digits = 8) # required to print all significant digits of Longitude
library(data.table)
cols <- c("Latitude", "Longitude")
setDT(test)[, (cols) := lapply(.SD, \(x) x + fifelse(x < 0, -1, 1) * 
                                 (rowidv(.SD, cols) - 1) * 0.00001), .SDcols = cols]
test

   Latitude  Longitude
1: 45.14565 -105.66660
2: 45.14566 -105.66661
3: 45.14567 -105.66662
4: 45.14565 -104.33330
5: 33.22220 -104.33330
6: 31.22122 -105.77777
7: 31.22123 -105.77778

Or `test[, Latitude := Latitude + (rowid(Latitude, Longitude) - 1) * 0.00001]`, to avoid `by`. — Henrik, Aug 03 '21 at 15:53

score 0 · Answer 2 · answered Aug 03 '21 at 15:22

As usual there is no need to use a loop:

library(dplyr)
test_updated = test %>% 
    mutate(
        across(c(Latitude, Longitutde), 
            function(x) if_else(x == lag(x), x+0.00001, x)
            )
        )

format(round(test_updated, 5), nsmall = 5)

  Latitude Longitutde
1 45.14566 -105.66659
2 45.14566 -105.66659
3 45.14566 -105.66659
4 45.14566 -104.33329
5 33.22221 -104.33329
6 31.22123 -105.77776
7 31.22123 -105.77776

score -1 · Answer 3 · answered Aug 03 '21 at 14:56

Not sure if I understand you correctly, but maybe something like this?

rm(list=ls())

n <- nrow(test)
test_updated <- data.frame(Latitude = double(n),
                       Longitude = double(n))

add <- 0.00001
test_updated[1,] <- test[1,]
for (i in 2:nrow(test)){
  if(test$Latitude[i-1] == test$Latitude[i] & test$Longitutde[i-1] == test$Longitutde[i]){
    test_updated$Latitude[i] <- test$Latitude[i] + add
    test_updated$Longitude[i] <- test$Longitutde[i] + add
   } else{
     test_updated[i,] <- test[i,]
  }
}

If 2 columns of data have the same values in the row above it, then add 0.00001 to it

3 Answers3

Edit