Create numpy array of arrays with leading zeros and different start- and stop-points

Question

I have different integer start and stop values and need all integer values in between as arrays in one array of shape (theRange, finalLength).

Example:

finalLength = 6
start = 2
stop = 3456
theRange = (stop - start) + 1


>>> array([[0, 0, 0, 0, 0, 2],
           [0, 0, 0, 0, 0, 3],
           [0, 0, 0, 0, 0, 4],
           ...,
           [0, 0, 3, 4, 5, 4],
           [0, 0, 3, 4, 5, 5],
           [0, 0, 3, 4, 5, 6]])

>>> array.shape (3455, 6)

As I need to run this function billions of times the current way is to slow.

At the moment I create the desired range using np.linspace. The integers are split into digits following (Split integer into digits using numpy.

If the number of digits of the largest number is not equal to finalLength, leading zeros are added. Finally, the resulting array is flipped and transposed to the desired output format. I think the integer splitting and transposing takes the most computation time.

Time increases with longer finalLength: Timeit 10000 reps

finalLength = 6 --> time: 2.815263898999546

finalLength = 12 --> time: 4.158567378000043

finalLength = 24 --> time: 5.038266787999419

Is there a faster way to create the final array?

Reproducable code:

import numpy as np

finalLength = 6
start = 2
stop = 3456
theRange = (stop - start) + 1

def makeRangeArray(start, stop, theRange, finalLength):
    # create integers within range
    ll = np.array(np.linspace(start=start, stop=stop, num=theRange), dtype=np.int64)

    # split integers into arrays
    b = 10
    n = np.ceil(np.max(np.log(ll) / np.log(b))).astype(np.int64)
    d = np.arange(n)
    d.shape = d.shape + (1,) * ll.ndim
    out = ll // b ** d % b

    # add leading zeros if necessary
    if finalLength - out.shape[0] != 0:
        addZeros = np.zeros([finalLength - out.shape[0], out.shape[1]], dtype=np.int64)
        out = np.append(out, addZeros, axis=0)  # insert zeros at the end of array

    # flip
    out = np.flip(out, axis=0)

    # transpose to desired final output format
    aaa = out.transpose().reshape((theRange, finalLength))

    return aaa

_As I need to run this function billions of times the current way is to slow._ I think your best bet is to find a way to not have to run anything billions of times, assuming that we're talking about a relatively short timespan, of course. — AMC, Feb 27 '20 at 01:45
To add to AMC's comment, if the same array is used every time, then generate once and use it as a global variable or pass the array between functions. If you need suggestions on how to optimise your workflow, then you can open another question. — Michael, Feb 27 '20 at 06:07
I want to create all possible combinations of numbers for a certain word length, e.g. 16 digits numbers with charset 0-9 will result in possible 10.000.000.000.000.000 combinations. To use np.unique() to filter for target combinations, e.g. discard all combinations having more than 8 duplicate digits, I need the number as arrays. Due to memory and cpu limitations I want to work with chunks of data. Hence, working with so many combinations to filter will result in billions of chunks. I am trying different approaches but until now I could not figure out one way that is fast enough. — STARmin, Mar 01 '20 at 11:13

Create numpy array of arrays with leading zeros and different start- and stop-points

0 Answers0