6

Given two arrays, say

arr = array([10, 24, 24, 24,  1, 21,  1, 21,  0,  0], dtype=int32)
rep = array([3, 2, 2, 0, 0, 0, 0, 0, 0, 0], dtype=int32)

np.repeat(arr, rep) returns

array([10, 10, 10, 24, 24, 24, 24], dtype=int32)

Is there any way to replicate this functionality for a set of 2D arrays?

That is given

arr = array([[10, 24, 24, 24,  1, 21,  1, 21,  0,  0],
            [10, 24, 24,  1, 21,  1, 21, 32,  0,  0]], dtype=int32)
rep = array([[3, 2, 2, 0, 0, 0, 0, 0, 0, 0],
            [2, 2, 2, 0, 0, 0, 0, 0, 0, 0]], dtype=int32)

is it possible to create a function which vectorizes?

PS: The number of repeats in each row need not be the same. I'm padding each result row to ensure that they are of same size.

def repeat2d(arr, rep):
    # Find the max length of repetitions in all the rows. 
    max_len = rep.sum(axis=-1).max()  
    # Create a common array to hold all results. Since each repeated array will have 
    # different sizes, some of them are padded with zero.
    ret_val = np.empty((arr.shape[0], maxlen))  
    for i in range(arr.shape[0]):
        # Repeated array will not have same num of cols as ret_val.
        temp = np.repeat(arr[i], rep[i])
        ret_val[i,:temp.size] = temp
    return ret_val 

I do know about np.vectorize and I know that it does not give any performance benefits over the normal version.

wwl
  • 2,025
  • 2
  • 30
  • 51
Aditya369
  • 834
  • 8
  • 20

2 Answers2

4

So you have a different repeat array for each row? But the total number of repeats per row is the same?

Just do the repeat on the flattened arrays, and reshape back to the correct number of rows.

In [529]: np.repeat(arr,rep.flat)
Out[529]: array([10, 10, 10, 24, 24, 24, 24, 10, 10, 24, 24, 24, 24,  1])
In [530]: np.repeat(arr,rep.flat).reshape(2,-1)
Out[530]: 
array([[10, 10, 10, 24, 24, 24, 24],
       [10, 10, 24, 24, 24, 24,  1]])

If the repetitions per row vary, we have the problem of padding variable length rows. That's come up in other SO questions. I don't recall all the details, but I think the solution is along this line:

Change rep so the numbers differ:

In [547]: rep
Out[547]: 
array([[3, 2, 2, 0, 0, 0, 0, 0, 0, 0],
       [2, 2, 2, 1, 0, 2, 0, 0, 0, 0]])
In [548]: lens=rep.sum(axis=1)
In [549]: lens
Out[549]: array([7, 9])
In [550]: m=np.max(lens)
In [551]: m
Out[551]: 9

create the target:

In [552]: res = np.zeros((arr.shape[0],m),arr.dtype)

create an indexing array - details need to be worked out:

In [553]: idx=np.r_[0:7,m:m+9]
In [554]: idx
Out[554]: array([ 0,  1,  2,  3,  4,  5,  6,  9, 10, 11, 12, 13, 14, 15, 16, 17])

flat indexed assignment:

In [555]: res.flat[idx]=np.repeat(arr,rep.flat)
In [556]: res
Out[556]: 
array([[10, 10, 10, 24, 24, 24, 24,  0,  0],
       [10, 10, 24, 24, 24, 24,  1,  1,  1]])
hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • The total number of repeats per row need not be the same. That is why I'm finding out the maxlen and then padding each row to be of the same size. – Aditya369 Oct 16 '16 at 00:50
  • And you are padding with the random `empty` values? I've seen masked inserts that can handle variable length rows, but don't recall the details. – hpaulj Oct 16 '16 at 01:26
  • Yes. I am padding them with random empty values. Though it makes more sense to pad with zeros in my case I guess. – Aditya369 Oct 16 '16 at 01:53
  • I'd suggest modifying your example so that this padding becomes important. – hpaulj Oct 16 '16 at 01:53
1

Another solution similar to @hpaulj's solution:

def repeat2dvect(arr, rep):
    lens = rep.sum(axis=-1)
    maxlen = lens.max()
    ret_val = np.zeros((arr.shape[0], maxlen))
    mask = (lens[:,None]>np.arange(maxlen))
    ret_val[mask] = np.repeat(arr.ravel(), rep.ravel())
    return ret_val

Instead of storing indices, I'm creating a bool mask and using the mask to set the values.

Aditya369
  • 834
  • 8
  • 20