21

I would like to make the following sequence in R, by using rep or any other function.

c(1, 2, 3, 4, 5, 2, 3, 4, 5, 3, 4, 5, 4, 5, 5)

Basically, c(1:5, 2:5, 3:5, 4:5, 5:5).

Maël
  • 45,206
  • 3
  • 29
  • 67
Rene
  • 363
  • 2
  • 7

4 Answers4

35

Use sequence.

sequence(5:1, from = 1:5)
[1] 1 2 3 4 5 2 3 4 5 3 4 5 4 5 5

The first argument, nvec, is the length of each sequence (5:1); the second, from, is the starting point for each sequence (1:5).

Note: this works only for R >= 4.0.0. From R News 4.0.0:

sequence() [...] gains arguments [e.g. from] to generate more complex sequences.

Henrik
  • 65,555
  • 14
  • 143
  • 159
Maël
  • 45,206
  • 3
  • 29
  • 67
  • 2
    @Henrik A very similar question answered some time ago using `sequence`: https://stackoverflow.com/a/67887135/9463489 – jblood94 Jan 04 '22 at 16:09
8
unlist(lapply(1:5, function(i) i:5))
# [1] 1 2 3 4 5 2 3 4 5 3 4 5 4 5 5

Some speed tests on all answers provided note the OP mentioned 10K somewhere if I recall correctly

s1 <- function(n) { 
  unlist(lapply(1:n, function(i) i:n))
}

s2 <- function(n) {
  unlist(lapply(seq_len(n), function(i) seq(from = i, to = n, by = 1)))
}

s3 <- function(n) {
  vect <- 0:n
  unlist(replicate(n, vect <<- vect[-1]))
}

s4 <- function(n) {
  m <- matrix(1:n, ncol = n, nrow = n, byrow = TRUE)
  m[lower.tri(m)] <- 0
  c(t(m)[t(m != 0)])
}

s5 <- function(n) {
  m <- matrix(seq.int(n), ncol = n, nrow = n)
  m[lower.tri(m, diag = TRUE)]
}

s6 <- function(n) {
  out <- c()
  for (i in 1:n) { 
    out <- c(out, (1:n)[i:n])
  }
  out
}

library(rbenchmark)

n = 5

n = 5L

benchmark(
  "s1" = { s1(n) },
  "s2" = { s2(n) },
  "s3" = { s3(n) },
  "s4" = { s4(n) },
  "s5" = { s5(n) },
  "s6" = { s6(n) },
  replications = 1000,
  columns = c("test", "replications", "elapsed", "relative")
)

Do not get fooled by some "fast" solutions using hardly any function that takes time to be called, and differences are multiplied by 1000x replications.

  test replications elapsed relative
1   s1         1000    0.05      2.5
2   s2         1000    0.44     22.0
3   s3         1000    0.14      7.0
4   s4         1000    0.08      4.0
5   s5         1000    0.02      1.0
6   s6         1000    0.02      1.0

n = 1000

n = 1000L

benchmark(
  "s1" = { s1(n) },
  "s2" = { s2(n) },
  "s3" = { s3(n) },
  "s4" = { s4(n) },
  "s5" = { s5(n) },
  "s6" = { s6(n) },
  replications = 10,
  columns = c("test", "replications", "elapsed", "relative")
)

As the poster already mentioned as "not to do", we see the for loop becoming pretty slow compared to any other method, on n = 1000L

  test replications elapsed relative
1   s1           10    0.17    1.000
2   s2           10    0.83    4.882
3   s3           10    0.19    1.118
4   s4           10    1.50    8.824
5   s5           10    0.29    1.706
6   s6           10   28.64  168.471

n = 10000

n = 10000L

benchmark(
  "s1" = { s1(n) },
  "s2" = { s2(n) },
  "s3" = { s3(n) },
  "s4" = { s4(n) },
  "s5" = { s5(n) },
  # "s6" = { s6(n) },
  replications = 10,
  columns = c("test", "replications", "elapsed", "relative")
)

At big n's we see matrix becomes very slow compared to the other methods. Using seq in the apply might be neater, but comes with a trade-off as calling that function n times increases processing time a lot. Although seq_len(n) is nicer than 1:n and is just run once. Interesting to see that the replicate method is the fastest.

  test replications elapsed relative
1   s1           10    5.44    1.915
2   s2           10    9.98    3.514
3   s3           10    2.84    1.000
4   s4           10   72.37   25.482
5   s5           10   35.78   12.599
Merijn van Tilborg
  • 5,452
  • 1
  • 7
  • 22
  • 1
    Careful with this. It will misbehave if you change the first argument without remembering to change the second. For example, `unlist(lapply(1:10, function(i) i:5))` isn't right. Changing the second argument to `function(i) seq(from = i, to = 5, by = 1)` is a lot more verbose, but it's safer. The ultimate version is probably something like `output <- function(x) unlist(lapply(seq_len(x), function(i) seq(from = i, to = x, by = 1)))`. – J. Mini Jan 04 '22 at 22:42
  • Hi @Merijn van Tilborg! Perhaps you could include the `sequence` answer in the timings as well? Cheers – Henrik Jan 05 '22 at 11:15
  • I would have if I could, but I have not the R version that supports the from argument. I expect it to be the same speed as s1 or s2 as if we look at the old sequence function it is basically a wrapper of `R: sequence function (nvec) unlist(lapply(nvec, seq_len))` – Merijn van Tilborg Jan 05 '22 at 11:54
  • Indeed, but it seems like that is [no longer the case](https://github.com/wch/r-source/blob/trunk/src/library/base/R/seq.R#L175-L177), so the timing may actually differ. – Henrik Jan 05 '22 at 12:43
  • 1
    A quick `system.time` with `sequence` and n = 10000 suggests that it is about 8-9 times faster than the `replicate` method. – Henrik Jan 05 '22 at 13:44
  • This could also be shortened to `unlist(lapply(1:5, ':', 5))`. – Robert Hacken Oct 31 '22 at 17:58
5

Your mention of rep reminded me of replicate, so here's a very stateful solution. I'm presenting this because it's short and unusual, not because it's good. This is very unidiomatic R.

vect <- 0:5
unlist(replicate(5, vect <<- vect[-1]))
[1] 1 2 3 4 5 2 3 4 5 3 4 5 4 5 5

You can do it with a combination of rep and lapply, but it's basically the same as Merijn van Tilborg's answer.

Of course, the truly fearless unidomatic R user does this and refuses to elaborate further.

mat <- matrix(1:5, ncol = 5, nrow = 5, byrow = TRUE)
mat[lower.tri(mat)] <- 0
c(t(mat)[t(mat != 0)])
[1] 1 2 3 4 5 2 3 4 5 3 4 5 4 5 5
J. Mini
  • 1,868
  • 1
  • 9
  • 38
  • 1
    Your matrix alternative can be slightly simplified: `m = matrix(seq.int(n), ncol = n, nrow = n)`; `m[lower.tri(m, diag = TRUE)]` (less unidiomatic though) – Henrik Jan 05 '22 at 00:07
  • @Henrik Good job. I knew that something was off when I had to call `t` twice while using `byrow=TRUE`. – J. Mini Jan 05 '22 at 21:21
  • I fully understand. I have got lost in the maze of `upper/lower.tri`/`byrow`/"to `t` or not to `t`" soo many times myself. Your unidiomatic contribution is much appreciated. – Henrik Jan 05 '22 at 21:26
  • 1
    The indexing could be golfed with `row(m)>=col(m)` – Henrik Jan 06 '22 at 10:26
0

You could use a loop like so:

out=c();for(i in 1:5){ out=c(out, (1:5)[i:5]) }
out
# [1] 1 2 3 4 5 2 3 4 5 3 4 5 4 5 5

but that's not a good idea!


Why not use a loop?

Using a loop is:

  • slower,
  • less memory efficient, and
  • harder to read and understand.

By contrast, using a vectorised function like sequence is the opposite (faster, more efficient, and easy to read).


Further info

From ?sequence:

The default method for sequence generates the sequence seq(from[i], by = by[i], length.out = nvec[i]) for each element i in the parallel (and recycled) vectors from, by and nvec. It then returns the result of concatenating those sequences.

and about the from argument:

from: each element specifies the first element of a sequence.

Also, since the vector used in the loop is not preallocated, it will require more memory, and will also be slower.

stevec
  • 41,291
  • 27
  • 223
  • 311