1

I'm trying to figure out if there's a way to do this that doesn't require a for loop.

I have a vector of data that increases sequentially, but skips occasional values. For example, test

num[1:4651] 2 2 2 2 3 3 3 3 3 3 7 7 9 9 9 9, etc.

Is there an R function that will convert that vector into a fixed sequence starting at 1 through the end of the vector? So,

1 1 1 1 2 2 2 2 3 3 4 4 4 4, etc. 
989
  • 12,579
  • 5
  • 31
  • 53
  • ``inverse.rle(list(values = 1:length(unique(x)), lengths = rle(x)$lengths))`` – d.b Apr 25 '17 at 14:49
  • 1
    @d.b More standard: `inverse.rle(with(rle(x), list(values = seq_along(values), lengths = lengths)))` – Frank Apr 25 '17 at 14:53

2 Answers2

4

We can use match to do this

match(test, unique(test))
#[1] 1 1 1 1 2 2 2 2 2 2 3 3 4 4 4 4

Or another option is factor

as.integer(factor(test, levels = unique(test)))
#[1] 1 1 1 1 2 2 2 2 2 2 3 3 4 4 4 4

As @Frank suggested, dense_rank from dplyr may also work as the values are increasing

dplyr::dense_rank(test)
#[1] 1 1 1 1 2 2 2 2 2 2 3 3 4 4 4 4

If the values are not repeating again, possibly rleid can be used

data.table::rleid(test)
#[1] 1 1 1 1 2 2 2 2 2 2 3 3 4 4 4 4

Or a base R option using rle

inverse.rle(within.list(rle(test), values <- seq_along(values)))
#[1] 1 1 1 1 2 2 2 2 2 2 3 3 4 4 4 4

Or another option is

cumsum(c(TRUE, test[-1] != test[-length(test)]))
#[1] 1 1 1 1 2 2 2 2 2 2 3 3 4 4 4 4

Or with lag from dplyr

cumsum(test != lag(test, default = TRUE))
#[1] 1 1 1 1 2 2 2 2 2 2 3 3 4 4 4 4

data

test <- c(2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 7, 7, 9, 9, 9, 9)
akrun
  • 874,273
  • 37
  • 540
  • 662
3

Using rle and rep in base R where vec is your vector:

with(rle(vec), rep(seq_along(lengths), times = lengths))
989
  • 12,579
  • 5
  • 31
  • 53