adjusting value of column based on if duplicate row - iteratively R

Question

Say I have this dataset:

df <- data.frame(time = c(100, 101, 101, 101, 102, 102, 103, 105, 109, 109, 109),
           val = c(1,3,1,2,3,1,2,3,1,2,1))

df

   time val
1   100   1
2   101   3
3   101   1
4   101   2
5   102   3
6   102   1
7   103   2
8   105   3
9   109   1
10  109   2
11  109   1

We can identify duplicate times in the 'time' column like this:

df[duplicated(df$time),]

What I want to do is to adjust the value of time (add 0.1) if it's duplicate. I could do this like this:

df$time <- ifelse(duplicated(df$time),df$time+.1,df$time)

   time val
1  100.0   1
2  101.0   3
3  101.1   1
4  101.1   2
5  102.0   3
6  102.1   1
7  103.0   2
8  105.0   3
9  109.0   1
10 109.1   2
11 109.1   1

The issue here is that we still have duplicate values e.g.rows 3 and 4 (that they differ in the column 'val' is irrelevant). Rows 10 and 11 have the same problem. Rows 5 and 6 are fine.

Is there a way of doing this iteratively - i.e. adding 0.1 to first duplicate, 0.2 to second duplicate (of same time value) etc. This way row 4 would become 101.2, and row 11 would become 109.2 . The number of duplicates per value is unknown but will never equal 10 (usually maximum 4).

Can't you create a counter for each 'time', as described in [Numbering rows within groups in a data frame](https://stackoverflow.com/questions/12925063/numbering-rows-within-groups-in-a-data-frame), and multiply by 0.1? Something like `ave(df$time, df$time, FUN = function(x) x + (seq_along(x) - 1) * 0.1)` — Henrik, Sep 27 '18 at 16:53

IceCreamToucan · Accepted Answer · 2018-09-27T18:11:34.670

1

As in the top answer for the related question linked by @Henrik, this uses data.table::rowid

library(data.table)
setDT(df)

df[, time := time + 0.1*(rowid(time) - 1)]

#      time val
#  1: 100.0   1
#  2: 101.0   3
#  3: 101.1   1
#  4: 101.2   2
#  5: 102.0   3
#  6: 102.1   1
#  7: 103.0   2
#  8: 105.0   3
#  9: 109.0   1
# 10: 109.1   2
# 11: 109.2   1

edited Sep 27 '18 at 18:11

answered Sep 27 '18 at 18:07

IceCreamToucan

28,083
2
22
38

Shree · Answer 2 · 2018-09-27T18:01:42.643

Here's a one line solution using base R -

df <- data.frame(time = c(100, 101, 101, 101, 102, 102, 103, 105, 109, 109, 109),
           val = c(1,3,1,2,3,1,2,3,1,2,1))
df$new_time <- df$time + duplicated(df$time)*0.1*(ave(seq_len(nrow(df)), df$time, FUN = seq_along) - 1)
df
#    time val new_time
# 1   100   1    100.0
# 2   101   3    101.0
# 3   101   1    101.1
# 4   101   2    101.2
# 5   102   3    102.0
# 6   102   1    102.1
# 7   103   2    103.0
# 8   105   3    105.0
# 9   109   1    109.0
# 10  109   2    109.1
# 11  109   1    109.2

acylam · Answer 3 · 2018-09-27T18:15:27.393

With dplyr:

library(dplyr)

df %>%
  group_by(time1 = time) %>%
  mutate(time = time + (0:(n()-1))*0.1) %>%
  ungroup() %>%
  select(-time1)

or with row_number() (suggested by Henrik):

df %>%
  group_by(time1 = time) %>%
  mutate(time = time + (row_number()-1)*0.1) %>%
  ungroup() %>%
  select(-time1)

Output:

    time val
1  100.0   1
2  101.0   3
3  101.1   1
4  101.2   2
5  102.0   3
6  102.1   1
7  103.0   2
8  105.0   3
9  109.0   1
10 109.1   2
11 109.2   1

adjusting value of column based on if duplicate row - iteratively R

3 Answers3