3

Is there a way to get array elements in one operation for known rows and columns of those elements? In each row I would like to access elements from col_start to col_end (each row has different starting and ending index). Number of elements is the same for each row, elements are consecutive. Example:

[ . . . . | | | . . . . . ]
[ | | | . . . . . . . . . ]
[ . . | | | . . . . . . . ]
[ . . . . . . . . | | | . ]

One solution would be to get indexes (row-column pair) of elements, and than use my_array[row_list,col_list].

Is there any other (simpler) way without using for loops?

recodeFuture
  • 325
  • 1
  • 3
  • 12
  • Yes, but can you provide a better example? – dursk Jan 17 '15 at 23:38
  • In the example ( | ) are elements I want to access, ( . ) are other elements. Would you like to know anything else? – recodeFuture Jan 17 '15 at 23:45
  • 1
    @tjons: what convinces you that we are working with a dictionary? The OP repeatedly refers to an array; the OP added the `numpy` tag; the representation looks a lot more like that of an array than a dictionary; etc. – DSM Jan 18 '15 at 00:04
  • @DSM my own confusedness. I'm wrong, and I've deleted the other comments. Thank you for pointing this out! On top of it, I didn't mean dictionary - I meant list. Whoops! – tjons Jan 19 '15 at 12:57

3 Answers3

4
A = np.arange(40).reshape(4,10)*.1
startend = [[2,5],[3,6],[4,7],[5,8]]
index_list = [np.arange(v[0],v[1]) + i*A.shape[1] 
                 for i,v in enumerate(startend)]
# [array([2, 3, 4]), array([13, 14, 15]), array([24, 25, 26]), array([35, 36, 37])]
A.flat[index_list]

producing

array([[ 0.2,  0.3,  0.4],
       [ 1.3,  1.4,  1.5],
       [ 2.4,  2.5,  2.6],
       [ 3.5,  3.6,  3.7]])

This still has an iteration, but it's a rather basic one over a list. I'm indexing the flattened, 1d, version of A. np.take(A, index_list) also works.

If the row intervals differ in size, I can use np.r_ to concatenate them. It's not absolutely necessary, but it is a convenience when building up indices from multiple intervals and values.

A.flat[np.r_[tuple(index_list)]]
# array([ 0.2,  0.3,  0.4,  1.3,  1.4,  1.5,  2.4,  2.5,  2.6,  3.5,  3.6, 3.7])

The idx that ajcr uses can be used without choose:

idx = [np.arange(v[0], v[1]) for i,v in enumerate(startend)]
A[np.arange(A.shape[0])[:,None], idx]

idx is like my index_list except that it doesn't add the row length.

np.array(idx)

array([[2, 3, 4],
       [3, 4, 5],
       [4, 5, 6],
       [5, 6, 7]])

Since each arange has the same length, idx can be generated without iteration:

col_start = np.array([2,3,4,5])
idx = col_start[:,None] + np.arange(3)

The first index is a column array that broadcasts to match this idx.

np.arange(A.shape[0])[:,None] 
array([[0],
       [1],
       [2],
       [3]])

With this A and idx I get the following timings:

In [515]: timeit np.choose(idx,A.T[:,:,None])
10000 loops, best of 3: 30.8 µs per loop

In [516]: timeit A[np.arange(A.shape[0])[:,None],idx]
100000 loops, best of 3: 10.8 µs per loop

In [517]: timeit A.flat[idx+np.arange(A.shape[0])[:,None]*A.shape[1]]
10000 loops, best of 3: 24.9 µs per loop

The flat indexing is faster, but calculating the fancier index takes up some time.

For large arrays, the speed of flat indexing dominates.

A=np.arange(4000).reshape(40,100)*.1
col_start=np.arange(20,60)
idx=col_start[:,None]+np.arange(30)

In [536]: timeit A[np.arange(A.shape[0])[:,None],idx]
10000 loops, best of 3: 108 µs per loop

In [537]: timeit A.flat[idx+np.arange(A.shape[0])[:,None]*A.shape[1]]
10000 loops, best of 3: 59.4 µs per loop

The np.choose method runs into a hardcoded limit: Need between 2 and (32) array objects (inclusive).


What out of bounds idx?

col_start=np.array([2,4,6,8])
idx=col_start[:,None]+np.arange(3)
A[np.arange(A.shape[0])[:,None], idx]

produces an error because the last idx value is 10, too large.

You could clip idx

idx=idx.clip(0,A.shape[1]-1)

producing duplicate values in the last row

[ 3.8,  3.9,  3.9]

You could also pad A before indexing. See np.pad for more options.

np.pad(A,((0,0),(0,2)),'edge')[np.arange(A.shape[0])[:,None], idx]

Another option is to remove out of bounds values. idx would then become a ragged list of lists (or array of lists). The flat approach can handle this, though the result will not be a matrix.

startend = [[2,5],[4,7],[6,9],[8,10]]
index_list = [np.arange(v[0],v[1]) + i*A.shape[1] 
                 for i,v in enumerate(startend)]
# [array([2, 3, 4]), array([14, 15, 16]), array([26, 27, 28]), array([38, 39])]

A.flat[np.r_[tuple(index_list)]]
# array([ 0.2,  0.3,  0.4,  1.4,  1.5,  1.6,  2.6,  2.7,  2.8,  3.8,  3.9])
hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • Do you think using list comprehension will be faster than simply using a for loop? – recodeFuture Jan 18 '15 at 10:51
  • For constant length ranges you don't need any iteration - just matrix addition. – hpaulj Jan 18 '15 at 22:06
  • I measured it myself and your method is faster indeed. Do you have any suggestions on how to prevent index out of bounds? – recodeFuture Jan 20 '15 at 12:46
  • Out of bounds - like if a `col_start` value is too large, so `col_start+n>A.shape[1]`? What should happen? – hpaulj Jan 20 '15 at 18:09
  • No, if col_start + n value is too large. I would like to "get" only elements that are in range - if A.shape[1] == 15 and col_start is 13, I would like to acquire only last two elements. If there is no "pretty" solution to this problem, I will use some sort of padding or some index checking and processing before acquiring array elements. – recodeFuture Jan 20 '15 at 21:38
  • 1
    I added some examples of dealing with out-of-bounds. – hpaulj Jan 21 '15 at 03:50
  • Thanks, your answers are really helpful. Does anything simplifies if size of second axis for idx is 1? After accessing those elements in matrix A and some calculations I get indexes for another array. I would like to get elements of that matrix as 1D array or a list. Any ideas? (That is my last question, I promise :) ) – recodeFuture Jan 21 '15 at 07:49
  • Off hand I'm not following you last question. Can you start a new question with some more detail? – hpaulj Jan 21 '15 at 07:56
  • Thanks. I tried to use it, but I didn't get the desired output, because of this: [:,None]. It works fine now that I removed it. – recodeFuture Jan 22 '15 at 06:56
3

You can use np.choose.

Here's an example NumPy array arr:

array([[ 0,  1,  2,  3,  4,  5,  6],
       [ 7,  8,  9, 10, 11, 12, 13],
       [14, 15, 16, 17, 18, 19, 20]])

Let's say we want to pick the values [1, 2, 3] from the first row, [11, 12, 13] from the second row and [17, 18, 19] from the third row.

In other words, we'll pick out the indices from each row of arr as shown in an array idx:

array([[1, 2, 3],
       [4, 5, 6],
       [3, 4, 5]])

Then using np.choose:

>>> np.choose(idx, arr.T[:,:,np.newaxis])
array([[ 1,  2,  3],
       [11, 12, 13],
       [17, 18, 19]])

To explain what just happened: arr.T[:,:,np.newaxis] meant that arr was temporarily viewed as 3D array with shape (7, 3, 1). You can imagine this as 3D array where each column of the original arr is now a 2D column vector with three values. The 3D array looks a bit like this:

#  0       1       2       3       4       5       6
[[ 0]   [[ 1]   [[ 2]   [[ 3]   [[ 4]   [[ 5]   [[ 6]   # choose values from 1, 2, 3
 [ 7]    [ 8]    [ 9]    [10]    [11]    [12]    [13]   # choose values from 4, 5, 6
 [14]]   [15]]   [16]]   [17]]   [18]]   [19]]   [20]]  # choose values from 3, 4, 5

To get the zeroth row of the output array, choose selects the zeroth element from the 2D column at index 1, the zeroth element from the 2D column at index 2, and the zeroth element from the 2D column at index 3.

To get the first row of the output array, choose selects the first element from the 2D column at index 4, the first element from the 2D column at index 5, ... and so on.

Alex Riley
  • 169,130
  • 45
  • 262
  • 238
  • Thanks, that looks like what I was thinking of. Now I have to check the performance of given solutions. – recodeFuture Jan 18 '15 at 16:29
  • I have one more question. What is the best way to create idx array if I have col_start vector and col_end vector equals (col_start + n)? – recodeFuture Jan 18 '15 at 20:17
  • @soccersniper: one way could be to use `np.vstack` and a list comprehension, e.g. `np.vstack([np.arange(x, x+n) for x in col_start])`. So above in my example, `n` is `3` and `col_start` is `[1, 4, 3]`. – Alex Riley Jan 18 '15 at 20:22
  • Because n << len(col_start) I would rather do this: np.array( [col_start+i for i in range(n)] ) (same result if I use np.vstack). I would have to transpose this array to use your solution. Is there any other way? – recodeFuture Jan 18 '15 at 20:50
  • It is possible to index `arr` with `idx` without `choose` - just use a matching column array for the 1st dimension. – hpaulj Jan 18 '15 at 21:49
  • I added `idx = col_start[:,None] + np.arange(3)` to my answer. – hpaulj Jan 18 '15 at 22:01
1

I think you're looking for something like the below. I'm not sure what you want to do with them when you access them though.

indexes = [(4,6), (0,2), (2,4), (8, 10)]
arr = [
    [ . . . . | | | . . . . . ],
    [ | | | . . . . . . . . . ],
    [ . . | | | . . . . . . . ],
    [ . . . . . . . . | | | . ]
]

for x in zip(indexes, arr):
    index = x[0]
    row = x[1]
    print row[index[0]:index[1]+1]
dursk
  • 4,435
  • 2
  • 19
  • 30
  • only problem is you now don't have a numpy array – Padraic Cunningham Jan 18 '15 at 00:30
  • I want to find max value for "masked" elements in each row. Solution for accessing those elements would be simple if columns would be the same for all rows: my_array[:,col_start:col_end]. What I was looking for was a modification of previous statement in the case of different column indexes. – recodeFuture Jan 18 '15 at 10:33
  • Where is numpy array coming from? OP says nothing about that? And @tjons: nothing in my answer is a dictionary? – dursk Jan 18 '15 at 13:31
  • Original array contains dot products between direction vectors and gradient vectors on "rays" pointing from the center outwards for given angles. So i-th row of dot_product array contains dot products along the "ray" for i-th angle. – recodeFuture Jan 18 '15 at 16:18
  • @mattm my own confusedness. I'm wrong, and I've deleted the other comments. Thank you for pointing this out! On top of it, I didn't mean dictionary - I meant list. Whoops! – tjons Jan 19 '15 at 12:58