3

Given boundary value k, is there a vectorized way to replace each number n with consecutive descending numbers from n-1 to k? For example, if k is 0 the I'd like to replace np.array([3,4,2,2,1,3,1]) with np.array([2,1,0,3,2,1,0,1,0,1,0,0,2,1,0,0]). Every item of input array is greater than k.

I have tried combination of np.repeat and np.cumsum but it seems evasive solution:

x = np.array([3,4,2,2,1,3,1])
y = np.repeat(x, x)
t = -np.ones(y.shape[0])
t[np.r_[0, np.cumsum(x)[:-1]]] = x-1
np.cumsum(t)

Is there any other way? I expect smth like inverse of np.add.reduceat that is able to broadcast integers to decreasing sequences instead of minimizing them.

mathfux
  • 5,759
  • 1
  • 14
  • 34

2 Answers2

2

Here's another way with array-assignment to skip the repeat part -

def func1(a):
    l = a.sum()
    out = np.full(l, -1, dtype=int)
    out[0] = a[0]-1
    idx = a.cumsum()[:-1]
    out[idx] = a[1:]-1
    return out.cumsum()

Benchmarking

# OP's soln
def OP(x):
    y = np.repeat(x, x)
    t = -np.ones(y.shape[0], dtype=int)
    t[np.r_[0, np.cumsum(x)[:-1]]] = x-1
    return np.cumsum(t)

Using benchit package (few benchmarking tools packaged together; disclaimer: I am its author) to benchmark proposed solutions.

import benchit

a = np.array([3,4,2,2,1,3,1])
in_ = [np.resize(a,n) for n in [10, 100, 1000, 10000]]
funcs = [OP, func1]
t = benchit.timings(funcs, in_)
t.plot(logx=True, save='timings.png')

enter image description here

Extend to take k as arg

def func1(a, k):
    l = a.sum()+len(a)*(-k)
    out = np.full(l, -1, dtype=int)
    out[0] = a[0]-1
    idx = (a-k).cumsum()[:-1]
    out[idx] = a[1:]-1-k
    return out.cumsum()

Sample run -

In [120]: a
Out[120]: array([3, 4, 2, 2, 1, 3, 1])

In [121]: func1(a, k=-1)
Out[121]: 
array([ 2,  1,  0, -1,  3,  2,  1,  0, -1,  1,  0, -1,  1,  0, -1,  0, -1,
        2,  1,  0, -1,  0, -1])
Divakar
  • 218,885
  • 19
  • 262
  • 358
  • So, no hidden methods or approaches in `numpy` functionality as I've been waiting. 2x speedup is much better though, thanks for collaboration. This helped me to speedup [`itertools.combinations`](https://stackoverflow.com/a/63694661/3044825) up to 1.5 times! – mathfux Sep 05 '20 at 17:28
0

This is concise and probably ok for efficiency; I don't think apply is vectorized here, so you will be limited mostly be the number of elements in the original array (less so their value is my guess):

import pandas as pd
x = np.array([3,4,2,2,1,3,1])

values = pd.Series(x).apply(lambda val: np.arange(val-1,-1,-1)).values
output = np.concatenate(values)
anon01
  • 10,618
  • 8
  • 35
  • 58