1

I'm having a play with R, and am struggling to get to grips with the different programming style required.

The task I'm trying to do, is given a sequence of numbers e.g. (1,2,3,3,3,4,5,5,1), to work out at each point the number of consecutive previous points with the same value. E.g. the answer for this example would be: (0,0,0,1,2,0,0,1,0).

In a more conventional programming language e.g. Python I'd do something like this:

flat_count = 0
for i in range(1, len(seq)):
   if seq[i] == seq[i-1]:
       flat_count++
   else:
       flat_count = 0
   seq[i] = flat_count
seq[0] = 0

Since my impression is that for loops in R should be avoided at all costs, I'm a bit confused as to where to begin.

My best attempt so far, is as follows:

runs <- rle(seq)
seqs <- sapply(runs$lengths, FUN=seq)

I'm not sure if this is a particular efficient way however, and if it is, I'm not sure how to concatenate my resulting lists in seqs together.

Any help appreciated, or just general best practices for R.

Thanks

WMycroft
  • 249
  • 1
  • 11

1 Answers1

3

We can use sequence which is a wrapper for unlist(lapply(yourvector, seq_len)). It loops (lapply) through the values of the vector, get the sequence (seq_len) and unlist it.

 sequence(runs$lengths)-1
 #[1] 0 0 0 1 2 0 0 1 0

We are subtracting 1 from the output to get the desired output.


Another option is using rleid from the devel version of data.table i.e. v1.9.5. Instructions to install the devel version are here

 library(data.table)#v1.9.5+
 setDT(list(v1))[, seq_along(V1)-1,rleid(V1)]$V1
 #[1] 0 0 0 1 2 0 0 1 0

We convert the 'v1' to 'data.table', grouped by rleid(V1), get the sequence of 'V1' and subtract from 1.

data

 v1 <- c(1,2,3,3,3,4,5,5,1)
 runs <- rle(v1) 
akrun
  • 874,273
  • 37
  • 540
  • 662