2

I have a question concerning R and changing values of a numeric sequence. I do have a column in a data.table which looks something like X here:

X <- data.table(id = c("103", "103", "103", "104", "104", "160", "160"), 
content = c("I", "don't", "know", "some", "more", "words", "."))

And I would like to replace the id values with sequential values to change the starting point and to get rid of the gaps in between. In the real life problem, there would be thousands of id values, so grep-ing them wouldn't be a possibility.

So what I would like to achieve is something like this:

Y <- data.table(id = c("0", "0", "0", "1", "1", "2", "2"), 
content = c("I", "don't", "know", "some", "more", "words", "."))

Any hint would be welcome as I don't know how to start. Thank you so much in advance!

  • Related: [How to get ranks with no gaps when there are ties among values?](https://stackoverflow.com/questions/4915704/how-to-get-ranks-with-no-gaps-when-there-are-ties-among-values). `X[ , id := frank(as.numeric(id), ties.method = "dense") - 1]` – Henrik Oct 29 '18 at 20:48

2 Answers2

0

We can convert 'id' to factor and then coerce it to integer

X[, id :=  as.character(as.integer(factor(id)) - 1)]

Or use match

X[, id := as.character(match(id, unique(id)) - 1)]

Or another option is .GRP

X[, id :=  as.character(.GRP -1) , id]

identical(X, Y)
#[1] TRUE

Or using tidyverse

library(tidyverse)
X %>%
   mutate(id = as.character(match(id, unique(id)) - 1))

Or

X %>% 
  mutate(id = as.character(group_indices(., id) - 1))

Or

X %>% 
   mutate(id = as.character(cumsum(id != lag(id, default = first(id)))))

or with base R

X$id <- as.character(match(df$id, unique(df$id) - 1)
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    Thank you so much for all the different approaches. I ended up using the very first solution using factors as this is the one I was thinking of at first, so it's closest to my initial understanding. But it is very interesting to see all the possibilities. – Klaus_Wuppertaler Nov 07 '18 at 11:06
0

Another option is rleid

library(data.table)
X[, id := rleid(id) - 1L][]
#   id content
#1:  0       I
#2:  0   don't
#3:  0    know
#4:  1    some
#5:  1    more
#6:  2   words
#7:  2       .

If you want id to be of type character then do

X[, id := as.character(rleid(id) - 1L)]
markus
  • 25,843
  • 5
  • 39
  • 58