0

Inspired by the post How to create a sequence of sequences of numbers in R?.


Question:

I would like to make the following sequence in NumPy.

[1, 2, 3, 4, 5, 2, 3, 4, 5, 3, 4, 5, 4, 5, 5]

I have tried the following:

  • Non-generic and hard coding using np.r_
    np.r_[1:6, 2:6, 3:6, 4:6, 5:6]
    # array([1, 2, 3, 4, 5, 2, 3, 4, 5, 3, 4, 5, 4, 5, 5])
    
  • Pure Python to generate the desired array.
    n = 5
    a = np.r_[1:n+1]
    [i for idx in range(a.shape[0]) for i in a[idx:]]
    # [1, 2, 3, 4, 5, 2, 3, 4, 5, 3, 4, 5, 4, 5, 5]
    
  • Create a 2D array and take the upper triangle from it.
    n = 5
    a = np.r_[1:n+1]
    arr = np.tile(a, (n, 1))
    print(arr)
    # [[1 2 3 4 5]
    #  [1 2 3 4 5]
    #  [1 2 3 4 5]
    #  [1 2 3 4 5]
    #  [1 2 3 4 5]]
    
    o = np.triu(arr).flatten()
    # array([1, 2, 3, 4, 5, 
    #        0, 2, 3, 4, 5, 
    #        0, 0, 3, 4, 5, # This is 1D array
    #        0, 0, 0, 4, 5, 
    #        0, 0, 0, 0, 5])
    
    out = o[o > 0]
    # array([1, 2, 3, 4, 5, 2, 3, 4, 5, 3, 4, 5, 4, 5, 5])
    

The above solution is generic but I want to know if there's a more efficient way to do it in NumPy.

Ch3steR
  • 20,090
  • 4
  • 28
  • 58

2 Answers2

2

I'm not sure if this is a good idea but I tried running it against your python method and it seems to be faster.

np.concatenate([np.arange(i, n+1) for i in range(1, n+1)])

Here is the full code:

import numpy as np
from time import time

n = 5000

t = time()
c = np.concatenate([np.arange(i, n+1) for i in range(1, n+1)])
print(time() - t)
# 0.039876699447631836

t = time()
a = np.r_[1:n+1]
b = np.array([i for idx in range(a.shape[0]) for i in a[idx:]])
print(time() - t)
# 2.0875167846679688

print(all(b == c))
# True
tangolin
  • 434
  • 5
  • 15
  • Faster than both the mentioned methods. Nice one. – Ch3steR Jan 05 '22 at 13:51
  • I didn't expect `concat` to be so fast!. I assume that it calculates the size of the final array in advance so it can allocate all the memory at once. – azelcer Jan 05 '22 at 15:41
1

A really plain Python (no numpy) way is:

n = 5
a = [r for start in range(1, n+1) for r in range(start, n+1)]

This will be faster for small n (~150) but slower than @tangolin's solution for larger n. It is still faster than the OP's "pure python" way.

A faster implementation prepares the data in advance, avoiding creating a new range each time :

source = np.arange(1, n+1)
d = np.concatenate([source[i: n+1] for i in range(0, n)])

NOTE

My original implementation both allocated space for the return value and prepared the data in advance, but it was not pythonic. I changed it to use concatenate after reading @tangolin's answer and noticed that concatenate does the same.

Original implementation:

e = np.empty((n*(n+1)//2, ), dtype='int64')
source = np.arange(1, n+1)
for i in range(n):
    init = n * i - i*(i-1)//2
    end = n - i + init
    e[init:end] = source[i:n]
azelcer
  • 1,383
  • 1
  • 3
  • 7