1

I want to get the result of a list (or array) of indices from a numpy array, in the shape: ( len(indices), (shape of one indexing operation) ).

Is there any way to use a list of indices directly, without using a for loop, like I used in the mininal example, shown below?

c = np.random.randint(0, 5, size=(4, 5))
indices = [[0, slice(0, 4)], [1, slice(0, 4)], [1, slice(0, 4)], [2, slice(0, 4)]]

# desired result using a for loop
res = []
for idx in indices:
    res.append(c[idx])

It should be noted, that the indices list is not representative of my problem, it serves as an example, in general it is generated during runtime. However, each index operation returns the same shape

sklingel
  • 169
  • 2
  • 9
  • If the edited sample data is a representative one, then how about `row_id = [idx[0] for idx in indices]` and then `res = np.vsplit(c[row_id,:4],4)`? The calculation of `row_id` is using a for-loop, which I don't think you can avoid, as `indices` is a list. – Divakar Sep 04 '15 at 08:56
  • This might be a solution, thanks. indices does not have to be a list, it could also be transformed or generated as an array of that form – sklingel Sep 04 '15 at 09:02
  • I think if the indices is a numpy array or a list that is not a nested one, it *could* be vectorized. – Divakar Sep 04 '15 at 09:03
  • If the suggested code works for you, let me know. I will edit the posted solution accordingly. – Divakar Sep 04 '15 at 12:54

2 Answers2

0

It seems that you are basically slicing until 2 rows and 4 columns from the start of the 2D input array and then splitting each row. You can do the slicing with c[:2,:4] and then split rows with np.vsplit to have a one-liner solution like so -

res_out = np.vsplit(c[:2,:4],2)

Sample run -

In [10]: c
Out[10]: 
array([[0, 2, 5, 1, 0],
       [1, 5, 5, 0, 3],
       [0, 1, 0, 6, 6],
       [2, 6, 2, 3, 3]])

In [11]: indices
Out[11]: [[0, slice(0, 4, None)], [1, slice(0, 4, None)]]

In [12]: # desired result using a for loop
    ...: res = []
    ...: for idx in indices:
    ...:     res.append(c[idx])
    ...:     

In [13]: res
Out[13]: [array([0, 2, 5, 1]), array([1, 5, 5, 0])]

In [14]: np.vsplit(c[:2,:4],2)
Out[14]: [array([[0, 2, 5, 1]]), array([[1, 5, 5, 0]])]

Please note that the output from np.vsplit would be a list of 2D arrays, rather than a list of 1D arrays as with the posted code in the question.

Divakar
  • 218,885
  • 19
  • 262
  • 358
  • Nice solution, but I think my mininal example is misleading. The indices list is generated generically, it's size is not determined beforehand. As well as it can contain duplicated indices, so i guess a split won't do the trick in my use case. – sklingel Sep 04 '15 at 08:40
  • @sklingel Edit your sample data in the question with a bit more generic/representative case you might come across? Any ideas on how the indices are generated? – Divakar Sep 04 '15 at 08:40
0

Your example can be rewritten as a list comprehension:

In [121]: [c[idx] for idx in indices]
Out[121]: 
[array([4, 2, 1, 2]),
 array([3, 2, 2, 3]),
 array([3, 2, 2, 3]),
 array([0, 3, 4, 4])]

which can be turned into a nice 2d array:

In [122]: np.array([c[idx] for idx in indices])
Out[122]: 
array([[4, 2, 1, 2],
       [3, 2, 2, 3],
       [3, 2, 2, 3],
       [0, 3, 4, 4]])

Here np.array() is a form of concatenation, joining the arrays along a new axis.

Since the 2nd index is the same for all rows (slice(4)), this indexing also works:

In [123]: c[[0,1,1,2],slice(4)]  # or [...,:4]
Out[123]: 
array([[4, 2, 1, 2],
       [3, 2, 2, 3],
       [3, 2, 2, 3],
       [0, 3, 4, 4]])

Repetition on the 1st axis is not a problem. Differing slices in the 2nd take some more manipulation. Except for this special :4 case, you will have to turn the slices in to ranges. There's no way of indexing one dimension with multiple slices.


The case where the slices all have same length, but different 'start' values, is similar to the one discussed in https://stackoverflow.com/a/28007256/901925 access-multiple-elements-of-an-array.

In [135]: c.flat[[i*c.shape[1]+np.arange(j.start,j.stop) for i,j in indices]]
Out[135]: 
array([[4, 2, 1, 2],
       [3, 2, 2, 3],
       [3, 2, 2, 3],
       [0, 3, 4, 4]])

The indices that I generate this way are:

In [136]: [i*c.shape[1]+np.arange(j.start,j.stop) for i,j in indices]
Out[136]: 
[array([0, 1, 2, 3]),
 array([5, 6, 7, 8]),
 array([5, 6, 7, 8]),
 array([10, 11, 12, 13])]

It works fine if indices is somewhat irregular: indices1 = [[0, slice(0, 3)], [1, slice(2, 5)], [1, slice(1, 4)], [2, slice(0, 3)]]

My earlier answer looks at some other ways indexing. But often indexing on a flatten array is fastest, even if you take into account the calculation required to generate the index array.

If the slices vary in length, then you are stuck with generating a list of arrays, or an hstack of such a list:

In [158]: indices2 = [[0, slice(0, 2)], [1, slice(2, 5)], 
    [1, slice(0, 4)], [2, slice(0, 5)]]

In [159]: c.flat[np.hstack([i*c.shape[1]+np.arange(j.start,j.stop) 
     for i,j in indices2])]
Out[159]: array([4, 2, 2, 3, 1, 3, 2, 2, 3, 0, 3, 4, 4, 3])

In [160]: [c.flat[i*c.shape[1]+np.arange(j.start,j.stop)] for i,j in indices2]
Out[160]: [array([4, 2]), array([2, 3, 1]), array([3, 2, 2, 3]),
    array([0, 3, 4, 4, 3])]

In [161]: np.hstack(_)
Out[161]: array([4, 2, 2, 3, 1, 3, 2, 2, 3, 0, 3, 4, 4, 3])

more on the varying, but equal length slices:

In [190]: indices1 = [[0, slice(0, 3)], [1, slice(2, 5)], [1, slice(1, 4)], [2, slice(0, 3)]]

In [191]: c.flat[[i*c.shape[1]+np.arange(j.start,j.stop) for i,j in indices1]]Out[191]: 
array([[4, 2, 1],
       [2, 3, 1],
       [2, 2, 3],
       [0, 3, 4]])

In [193]: rows = [[i] for i,j in indices1]
In [200]: cols=[np.arange(j.start,j.stop) for i,j in indices1]

In [201]: c[rows,cols]
Out[201]: 
array([[4, 2, 1],
       [2, 3, 1],
       [2, 2, 3],
       [0, 3, 4]])

In this case rows is a vertical list that can be broadcasted with cols.

Community
  • 1
  • 1
hpaulj
  • 221,503
  • 14
  • 230
  • 353