Replacing a fragmented numeric sequence with continuous sequence in data.table

Question

I have a question concerning R and changing values of a numeric sequence. I do have a column in a data.table which looks something like X here:

X <- data.table(id = c("103", "103", "103", "104", "104", "160", "160"), 
content = c("I", "don't", "know", "some", "more", "words", "."))

And I would like to replace the id values with sequential values to change the starting point and to get rid of the gaps in between. In the real life problem, there would be thousands of id values, so grep-ing them wouldn't be a possibility.

So what I would like to achieve is something like this:

Y <- data.table(id = c("0", "0", "0", "1", "1", "2", "2"), 
content = c("I", "don't", "know", "some", "more", "words", "."))

Any hint would be welcome as I don't know how to start. Thank you so much in advance!

Related: [How to get ranks with no gaps when there are ties among values?](https://stackoverflow.com/questions/4915704/how-to-get-ranks-with-no-gaps-when-there-are-ties-among-values). `X[ , id := frank(as.numeric(id), ties.method = "dense") - 1]` — Henrik, Oct 29 '18 at 20:48

akrun · Accepted Answer · 2018-10-29T20:47:04.977

0

We can convert 'id' to factor and then coerce it to integer

X[, id :=  as.character(as.integer(factor(id)) - 1)]

Or use match

X[, id := as.character(match(id, unique(id)) - 1)]

Or another option is .GRP

X[, id :=  as.character(.GRP -1) , id]

identical(X, Y)
#[1] TRUE

Or using tidyverse

library(tidyverse)
X %>%
   mutate(id = as.character(match(id, unique(id)) - 1))

Or

X %>% 
  mutate(id = as.character(group_indices(., id) - 1))

Or

X %>% 
   mutate(id = as.character(cumsum(id != lag(id, default = first(id)))))

or with base R

X$id <- as.character(match(df$id, unique(df$id) - 1)

edited Oct 29 '18 at 20:47

answered Oct 29 '18 at 20:34

akrun

874,273
37
540
662

1

Thank you so much for all the different approaches. I ended up using the very first solution using factors as this is the one I was thinking of at first, so it's closest to my initial understanding. But it is very interesting to see all the possibilities. – Klaus_Wuppertaler Nov 07 '18 at 11:06

markus · Answer 2 · 2018-10-29T20:43:28.707

0

Another option is rleid

library(data.table)
X[, id := rleid(id) - 1L][]
#   id content
#1:  0       I
#2:  0   don't
#3:  0    know
#4:  1    some
#5:  1    more
#6:  2   words
#7:  2       .

If you want id to be of type character then do

X[, id := as.character(rleid(id) - 1L)]

edited Oct 29 '18 at 20:43

answered Oct 29 '18 at 20:37

markus

25,843
5
39
58

Thank you, that is very elegant! – Klaus_Wuppertaler Nov 07 '18 at 11:06

Replacing a fragmented numeric sequence with continuous sequence in data.table

2 Answers2