1

Given a 2d array:

a = np.array([[10,0,30,10],[40,50,60,10],[70,80,90,10]])

An index array as an array of objects:

i = np.array([[0,1],[0,2],[0,1,2]])  #Note different lengths

Expected result:

e = [[10,0,30,10,40,50,60,10],[10,0,30,10,70,80,90,10],[10,0,30,10,40,50,60,10,70,80,90,10]]   

What works:

e = [np.hstack(a[i[j]]) for j in range(len(i))]

Is there a way to do this in a pure vectorized manner?

I found out that numpy.where() doesn't work as the elements in the index array (i.e i) should be of same length which is not in my case. Can someone point me in the right direction?

EDIT: Adding to the above question I am also interested to know how to do the same operation when the array 'a' changes to:

a = np.array([[10,0,30,10],[40,50,60,10],[70,80,90,10,30]])#NOTE:Jagged array

The index array 'i' however stays the same!

stut
  • 11
  • 3
  • Does this answer your question? [How do I stack vectors of different lengths in NumPy?](https://stackoverflow.com/questions/14916407/how-do-i-stack-vectors-of-different-lengths-in-numpy) – Joe Apr 29 '20 at 12:10
  • https://stackoverflow.com/questions/37212981/python-jagged-array-operation-efficiency – Joe Apr 29 '20 at 12:10
  • https://stackoverflow.com/questions/37212981/python-jagged-array-operation-efficiency – Joe Apr 29 '20 at 12:10
  • I have to go through the details of awkward-array @Joe – stut Apr 29 '20 at 12:51
  • https://stackoverflow.com/a/3386428/7919597 – Joe Apr 29 '20 at 12:57

3 Answers3

1

If a is jagged, but i is multidimensional, we can using i to index a:

In [78]: a = np.array([[10,0,30,10],[40,50,60,10],[70,80,90,10,30]])#NOTE:Jagged array                 
In [79]: i = np.array([[0,1],[0,2],[1,2]])                                                             

In [80]: a.shape    #  an array of list objects                                                                                       
Out[80]: (3,)

In [81]: a[i]                                                                                          
Out[81]: 
array([[list([10, 0, 30, 10]), list([40, 50, 60, 10])],
       [list([10, 0, 30, 10]), list([70, 80, 90, 10, 30])],
       [list([40, 50, 60, 10]), list([70, 80, 90, 10, 30])]], dtype=object)

Since these are list objects, we can use sum to "concatenate" them:

In [82]: a[i].sum(axis=1)                                                                              
Out[82]: 
array([list([10, 0, 30, 10, 40, 50, 60, 10]),
       list([10, 0, 30, 10, 70, 80, 90, 10, 30]),
       list([40, 50, 60, 10, 70, 80, 90, 10, 30])], dtype=object)

your list comprehension:

In [83]: e = [np.hstack(a[i[j]]) for j in range(len(i))]                                               
In [84]: e                                                                                             
Out[84]: 
[array([10,  0, 30, 10, 40, 50, 60, 10]),
 array([10,  0, 30, 10, 70, 80, 90, 10, 30]),
 array([40, 50, 60, 10, 70, 80, 90, 10, 30])]

some timings:

In [85]: timeit a[i].sum(axis=1)                                                                       
8.64 µs ± 17.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
In [86]: timeit e = [np.hstack(a[i[j]]) for j in range(len(i))]                                        
63.3 µs ± 168 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Your hstack may be slower because it's converting the a lists to arrays. Let's by pass that:

In [89]: [sum(a[i[j]],[]) for j in range(len(i))]                                                      
Out[89]: 
[[10, 0, 30, 10, 40, 50, 60, 10],
 [10, 0, 30, 10, 70, 80, 90, 10, 30],
 [40, 50, 60, 10, 70, 80, 90, 10, 30]]
In [90]: timeit [sum(a[i[j]],[]) for j in range(len(i))]                                               
8.41 µs ± 109 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

Sometimes pure list solutions are faster. Converting lists of arrays takes time.

===

If both arrays are equalized and multidimensonal, we can use a pure "vectorized" solution:

In [104]: aa = np.array([[10,0,30,10],[40,50,60,10],[70,80,90,10]])                                    
In [105]: i                                                                                            
Out[105]: 
array([[0, 1],
       [0, 2],
       [1, 2]])
In [106]: aa[i]                                                                                        
Out[106]: 
array([[[10,  0, 30, 10],
        [40, 50, 60, 10]],

       [[10,  0, 30, 10],
        [70, 80, 90, 10]],

       [[40, 50, 60, 10],
        [70, 80, 90, 10]]])
In [107]: aa[i].reshape(3,-1)                                                                          
Out[107]: 
array([[10,  0, 30, 10, 40, 50, 60, 10],
       [10,  0, 30, 10, 70, 80, 90, 10],
       [40, 50, 60, 10, 70, 80, 90, 10]])
In [108]: timeit aa[i].reshape(3,-1)                                                                   
5.07 µs ± 57.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

But once one or more of the arrays/lists are ragged you loose this option, and need to seriously consider list alternatives.

hpaulj
  • 221,503
  • 14
  • 230
  • 353
0

Numpy arrays tend to like to be the same size. In your case your i array would work just as well as a list of lists because the lengths are inconsistent.

Your answer works fine, I have another solution, but it works in a similar fashion to yours:

import numpy as np

a = np.array([[10,0,30,10],[40,50,60,10],[70,80,90,10]])
# 'i' can also be a list of lists
i = [[0,1],[0,2],[0,1,2]]

[np.concatenate(a[j]) for j in i]
>>> [array([10,  0, 30, 10, 40, 50, 60, 10]),
     array([10,  0, 30, 10, 70, 80, 90, 10]),
     array([10,  0, 30, 10, 40, 50, 60, 10, 70, 80, 90, 10])]

Another method using vectors:

# make 'i' a consistent shape by including NaN values 
i = np.array([[0,1,np.nan],[0,2,np.nan],[0,1,2]])

# filter out NaN values and index
a[i[np.isfinite(i)].astype(int)]
>>> array([[10,  0, 30, 10],
           [40, 50, 60, 10],
           [10,  0, 30, 10],
           [70, 80, 90, 10],
           [10,  0, 30, 10],
           [40, 50, 60, 10],
           [70, 80, 90, 10]])

I think at this point you'd still have to use a loop to concatenate properly so either the first method or your answer are more straightforward.

Paddy Harrison
  • 1,808
  • 1
  • 8
  • 21
  • Yes, I have tried that too but it doesn't solves my issue as I dont want a for loop to be involved. Infact, concatenate does the same job as hstack. Do you know any way of doing this purely using numpy? – stut Apr 29 '20 at 11:54
  • That's my best unfortunately, I would be interested to know if someone manages it! – Paddy Harrison Apr 29 '20 at 12:00
0

If you simply want to hide the for loop you could use map like this:

In [891]: import numpy as np

In [892]: a = np.array([[10,  0, 30, 10], 
     ...:               [40, 50, 60, 10], 
     ...:               [70, 80, 90, 10]])

In [893]: i = [[0, 1], 
     ...:      [0, 2], 
     ...:      [0, 1, 2]]

In [894]: e = [[10, 0, 30, 10, 40, 50, 60, 10], 
     ...:      [10, 0, 30, 10, 70, 80, 90, 10], 
     ...:      [10, 0, 30, 10, 40, 50, 60, 10, 70, 80, 90, 10]]

In [895]: e == list(map(lambda x: a[x].flatten().tolist(), i))
Out[895]: True

Notice that the code above is not actually vectorized (see List comprehension vs. map).

Tonechas
  • 13,398
  • 16
  • 46
  • 80
  • Yes, that seems to work too. Any idea what to do if we have the array as: a = np.array([[10, 0, 30, 10], [40, 50, 60, 10], [70, 80, 90, 10,50]]) instead! Then it seems we have to use list(map(lambda x: np.concatenate(a[x].flatten().tolist()), i)) to get the desired output. But it becomes slower when you have a list 'i' of say 2000 elements. Is there a work around too? – stut Apr 29 '20 at 12:57
  • `map` just uses a different syntax than a list comprehension. Python implements the two with similar speed. When developing python3 some wanted to drop `map` as unnecessary. It is not a `numpy` "vectorization" tool. – hpaulj Apr 29 '20 at 16:41