Repeat but in variable sized chunks in numpy

Question

I have an array that is the concatenation of different chunks:

a = np.array([0, 1, 2, 10, 11, 20, 21, 22, 23])
#             >     <  >    <  >            <
chunks = np.array([3, 2, 4])
repeats = np.array([1, 3, 2])

Each segment starting with a new decade in the example above is a separate "chunk" that I would like to repeat. The chunk sizes and number of repetitions are known for each. I can't do a reshape followed by kron or repeat because the chunks are different sizes.

The result I would like is

np.array([0, 1, 2, 10, 11, 10, 11, 10, 11, 20, 21, 22, 23, 20, 21, 22, 23])
# repeats:>  1  <  >         3          <  >              2             <

This is easy to do in a loop:

in_offset = np.r_[0, np.cumsum(chunks[:-1])]
out_offset = np.r_[0, np.cumsum(chunks[:-1] * repeats[:-1])]
output = np.zeros((chunks * repeats).sum(), dtype=a.dtype)
for c in range(len(chunks)):
    for r in range(repeats[c]):
        for i in range(chunks[c]):
            output[out_offset[c] + r * chunks[c] + i] = a[in_offset[c] + i]

This leads to the following vectorization:

regions = chunks * repeats
index = np.arange(regions.sum())

segments = np.repeat(chunks, repeats)
resets = np.cumsum(segments[:-1])
offsets = np.zeros_like(index)
offsets[resets] = segments[:-1]
offsets[np.cumsum(regions[:-1])] -= chunks[:-1]

index -= np.cumsum(offsets)

output = a[index]

Is there a more efficient way to vectorize this problem? Just so we are clear, I am not asking for a code review. I am happy with how these function calls work together. I would like to know if there is an entirely different (more efficient) combination of function calls I could use to achieve the same result.

This question was inspired by my answer to this question.

score 1 · Answer 1 · answered Aug 20 '20 at 19:35

A more numpythonic way to do your task (than the other answer) is:

result = np.concatenate([ np.tile(tbl, rpt) for tbl, rpt in
    zip(np.split(a, np.cumsum(chunks[:-1])), repeats) ])

The result is:

array([ 0,  1,  2, 10, 11, 10, 11, 10, 11, 20, 21, 22, 23, 20, 21, 22, 23])

score 1 · Answer 2 · answered Aug 20 '20 at 20:06

1

For those chunks being range arrays, we can directly work on the input array and thus avoid the final indexing step and that should improve things -

# https://stackoverflow.com/a/47126435/ @Divakar
def create_ranges(starts, ends, l):
    clens = l.cumsum()
    ids = np.ones(clens[-1],dtype=int)
    ids[0] = starts[0]
    ids[clens[:-1]] = starts[1:] - ends[:-1]+1
    out = ids.cumsum()
    return out

s = np.r_[0,chunks.cumsum()]
starts = a[np.repeat(s[:-1],repeats)]
l = np.repeat(chunks, repeats)
ends = starts+l
out = create_ranges(starts, ends, l)

answered Aug 20 '20 at 20:06

Divakar

218,885
19
262
358

Akshay's answer is by far the most elegant. – Mad Physicist Aug 20 '20 at 20:32
This answer does not produce the correct output when the chunks in the input array are not sequential. For example, if we modify the array a in the original post a little bit: `a = np.array([0, 5, 2, 10, 15, 20, 21, 22, 23])` we get `output = [ 0 1 2 10 11 10 11 10 11 20 21 22 23 20 21 22 23]` instead of the correct solution `output = [ 0 5 2 10 15 10 15 10 15 20 21 22 23 20 21 22 23]` – neuro630 Mar 01 '23 at 22:01

Akshay Sehgal · Accepted Answer · 2020-08-20T21:05:49.457

1

An even more "numpythonic" way of solving this than the other answer is -

np.concatenate(np.repeat(np.split(a, np.cumsum(chunks))[:-1], repeats))

array([ 0,  1,  2, 10, 11, 10, 11, 10, 11, 20, 21, 22, 23, 20, 21, 22, 23])

Notice, no explicit for-loops.

(np.split has an implicit loop as pointed out by @Divakar).

EDIT: Benchmarks (MacBook pro 13) -

Divakar's solution scales better for larger arrays, chunks and repeats as @Mad Physicist pointed out in his post.

edited Aug 20 '20 at 21:05

answered Aug 20 '20 at 20:29

Akshay Sehgal

18,741
3
21
51

Should be faster than other solutions. I am adding the results in my answer – Akshay Sehgal Aug 20 '20 at 20:33
Elegant for sure. Actually `np.split` has loop and that will show up when we start dealing with many split arrays. To verify loop : Source code - https://github.com/numpy/numpy/blob/v1.19.0/numpy/lib/shape_base.py#L799-L874 uses `array_split` > https://github.com/numpy/numpy/blob/v1.19.0/numpy/lib/shape_base.py#L739-L792. – Divakar Aug 20 '20 at 20:43
1

Your benchmark is for a tiny array I suspect (the example in the question?), and `create_ranges` should be outside of `f` for a fair comparison, or better yet, inlined. Also, you can use `%%timeit` cell magic to time multiple lines simultaneously. – Mad Physicist Aug 20 '20 at 20:57
oh ok, let me change that and rerun. I don't think it should cause a lot of difference since function call is the only thing I am timing. But will update accordingly – Akshay Sehgal Aug 20 '20 at 20:59
updated, you are right about the larger data benchmarks, it skipped my mind apologies. The ```create_ranges()``` inside ```f()``` doesn't affect the runtime, I double-checked. That's because %timeit was only benchmarking the function call of ```f()``` – Akshay Sehgal Aug 20 '20 at 21:06

Mad Physicist · Answer 4 · 2020-08-20T20:54:56.603

For informational purposes, I've benchmarked the working solutions here.:

def MadPhysicist1(a, chunks, repeats):
    in_offset = np.r_[0, np.cumsum(chunks[:-1])]
    out_offset = np.r_[0, np.cumsum(chunks[:-1] * repeats[:-1])]
    output = np.zeros((chunks * repeats).sum(), dtype=a.dtype)
    for c in range(len(chunks)):
        for r in range(repeats[c]):
            for i in range(chunks[c]):
                output[out_offset[c] + r * chunks[c] + i] = a[in_offset[c] + i]
    return output

def MadPhysicist2(a, chunks, repeats):
    regions = chunks * repeats
    index = np.arange(regions.sum())

    segments = np.repeat(chunks, repeats)
    resets = np.cumsum(segments[:-1])
    offsets = np.zeros_like(index)
    offsets[resets] = segments[:-1]
    offsets[np.cumsum(regions[:-1])] -= chunks[:-1]

    index -= np.cumsum(offsets)

    output = a[index]
    return output

def create_ranges(starts, ends, l):
    clens = l.cumsum()
    ids = np.ones(clens[-1],dtype=int)
    ids[0] = starts[0]
    ids[clens[:-1]] = starts[1:] - ends[:-1]+1
    out = ids.cumsum()
    return out

def Divakar(a, chunks, repeats):
    s = np.r_[0, chunks.cumsum()]
    starts = a[np.repeat(s[:-1], repeats)]
    l = np.repeat(chunks, repeats)
    ends = starts+l
    return create_ranges(starts, ends, l)

def Valdi_Bo(a, chunks, repeats):
    return np.concatenate([np.tile(tbl, rpt) for tbl, rpt in
                           zip(np.split(a, np.cumsum(chunks[:-1])), repeats)])

def AkshaySehgal(a, chunks, repeats):
    return np.concatenate(np.repeat(np.split(a, np.cumsum(chunks))[:-1], repeats))

I've looked at the timings for three input sizes: ~100, ~1000 and ~10k elements:

np.random.seed(0xA)
chunksA = np.random.randint(1, 10, size=20)   # ~100 elements
repeatsA = np.random.randint(1, 10, size=20)
arrA = np.random.randint(100, size=chunksA.sum())

np.random.seed(0xB)
chunksB = np.random.randint(1, 100, size=20)  # ~1000 elements
repeatsB = np.random.randint(1, 10, size=20)
arrB = np.random.randint(100, size=chunksB.sum())

np.random.seed(0xC)
chunksC = np.random.randint(1, 100, size=200)  # ~10000 elements
repeatsC = np.random.randint(1, 10, size=200)
arrC = np.random.randint(100, size=chunksC.sum())

Here are some results:

|               |    A    |    B    |    C    |
+---------------+---------+---------+---------+
| MadPhysicist1 | 1.92 ms |   16 ms |  159 ms |
| MadPhysicist2 | 85.5 µs |  153 µs |  744 µs |
| Divakar       | 75.9 µs | 95.9 µs |  312 µs |
| Valdi_Bo      |  370 µs |  369 µs |  3.4 ms |
| AkshaySehgal  |  163 µs |  165 µs | 1.24 ms |

Do check my benchmarks imgur.com/4NSw5ro. I think it has to do with the variance in runtimes. — Akshay Sehgal, Aug 20 '20 at 20:49
@AkshaySehgal. I can't check regular imgur at work, but perhaps you could upload to stack overflow imgur? — Mad Physicist, Aug 20 '20 at 20:50
oh, never done that before, could you share the link for that — Akshay Sehgal, Aug 20 '20 at 20:50
@AkshaySehgal. I would edit your question to include the image, then remove it from the question, but keep a link to the uploaded file. — Mad Physicist, Aug 20 '20 at 20:51
@AkshaySehgal. I've also made this post into a community wiki, so feel free to edit in anything you happen to find useful — Mad Physicist, Aug 20 '20 at 20:52
@AkshaySehgal. I would guess that the biggest differences would come in from the number of chunks and the number of repeats of each. — Mad Physicist, Aug 20 '20 at 20:53
cool, updated my answer for your reference, i am using jupyter notebook magic command ```%timeit``` — Akshay Sehgal, Aug 20 '20 at 20:54

score -1 · Answer 5 · answered Aug 20 '20 at 18:24

-1

    for rep, num in zip(repeats, chunks):
        res.extend(list(range(num))*rep)

[0, 1, 2, 0, 1, 0, 1, 0, 1, 0, 1, 2, 3, 0, 1, 2, 3]

answered Aug 20 '20 at 18:24

galaxyan

5,944
2
19
43

My question was a bit unclear, so I updated it. The sequence 0-chunk_size was incidental. It can be completely random numbers. My goal is to actually replicate the content of the input array, not generate it, e.g. with `range` – Mad Physicist Aug 20 '20 at 18:30

Repeat but in variable sized chunks in numpy

5 Answers5

Linked