6

I am trying to find a proper way, in R, to find duplicated values, and add the value 1 to each subsequent duplicated value grouped by id. For example:

data = data.table(id = c('1','1','1','1','1','2','2','2'),
                  value = c(95,100,101,101,101,20,35,38))

data$new_value <- ifelse(data[ , data$value] == lag(data$value,1),
                         lag(data$value, 1) + 1 ,data$value)
data$desired_value <- c(95,100,101,102,103,20,35,38)

Produces:

   id value new_value desired_value
1:  1    95        NA            95
2:  1   100       100           100
3:  1   101       101           101 # first 101 in id 1: add 0
4:  1   101       102           102 # second 101 in id 1: add 1
5:  1   101       102           103 # third 101 in id 1: add 2
6:  2    20        20            20
7:  2    35        35            35
8:  2    38        38            38

I tried doing this with ifelse, but it doesn't work recursively so it only applies to the following row, and not any subsequent rows. Also the lag function results in me losing the first value in value.

I've seen examples with character variables with make.names or make.unique, but haven't been able to find a solution for a duplicated numeric value.

Background: I am doing a survival analysis and I am finding that with my data there are stop times that are the same, so I need to make it unique by adding a 1 (stop times are in seconds).

Henrik
  • 65,555
  • 14
  • 143
  • 159
Daren Eiri
  • 137
  • 12
  • Related: [Numbering rows within groups in a data frame](https://stackoverflow.com/questions/12925063/numbering-rows-within-groups-in-a-data-frame) – Henrik Aug 03 '21 at 18:42

4 Answers4

6

Here's an attempt. You're essentially grouping by id and value and adding 0:(length(value)-1). So:

data[, onemore := value + (0:(.N-1)), by=.(id, value)]

#   id value new_value desired_value onemore
#1:  1    95        96            95      95
#2:  1   100       101           100     100
#3:  1   101       102           101     101
#4:  1   101       102           102     102
#5:  1   101       102           103     103
#6:  2    20        21            20      20
#7:  2    35        36            35      35
#8:  2    38        39            38      38
thelatemail
  • 91,185
  • 12
  • 128
  • 188
  • 1
    @akrun and @thelatemail I see. So here we use the "by" to group, which .N handles and becomes the total number of observations in that group. With `101` appearing 3 times within id `1`, .N = 3, or `(0:(3-1))`. That results in `value + (0:2)`, or simply `value + 0, value +1, value +2`. Thank you! – Daren Eiri Apr 04 '17 at 16:46
5

With base R we can use ave where we take the first value of each group and basically add the row number of that row in that group.

data$value1 <- ave(data$value, data$id, data$value, FUN = function(x)
                                                      x[1] + seq_along(x) - 1)

#   id value new_value desired_value value1
#1:  1    95        96            95     95
#2:  1   100       101           100    100
#3:  1   101       102           101    101
#4:  1   101       102           102    102
#5:  1   101       102           103    103
#6:  2    20        21            20     20
#7:  2    35        36            35     35
#8:  2    38        39            38     38
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
5

Here is one option with tidyverse

library(dplyr)
data %>%
    group_by(id, value) %>%
    mutate(onemore = value + row_number()-1)
#     id value onemore
#  <chr> <dbl>   <dbl>
#1     1    95      95
#2     1   100     100
#3     1   101     101
#4     1   101     102
#5     1   101     103
#6     2    20      20
#7     2    35      35
#8     2    38      38

Or we can use base R without anonymous function call

data$onemore <- with(data, value + ave(value, id, value, FUN =seq_along)-1)
data$onemore
#[1]  95 100 101 102 103  20  35  38
akrun
  • 874,273
  • 37
  • 540
  • 662
1

To avoid (a potentially costly) by, you may use rowid:

data[, res := value + rowid(id, value) - 1]
# data
#    id value new_value desired_value res
# 1:  1    95        96            95  95
# 2:  1   100       101           100 100
# 3:  1   101       102           101 101
# 4:  1   101       102           102 102
# 5:  1   101       102           103 103
# 6:  2    20        21            20  20
# 7:  2    35        36            35  35
# 8:  2    38        39            38  38 
Henrik
  • 65,555
  • 14
  • 143
  • 159