Construct Numpy index given list of starting and ending positions

Question

I have two identically-sized numpy.array objects (both one-dimensional), one of which contains a list of starting index positions, and the other of which contains a list of ending index positions (alternatively you could say I have a list of starting positions and window lengths). In case it matters, the slices formed by the starting and ending positions are guaranteed to be non-overlapping. I am trying to figure out how to use these starting and ending positions to form an index for another array object, without having to use a loop.

For example:

import numpy as np
start = np.array([1,7,20])
end = np.array([3,10,25])

Want to reference

somearray[1,2,7,8,9,20,21,22,23,24])

Sven Marnach · Accepted Answer · 2011-01-17T06:20:15.697

7

I would use

np.r_[tuple(slice(s, e) for s, e in zip(start, end))]

EDIT: Here is a solution that does not use a Python loop:

def indices(start, end):
    lens = end - start
    np.cumsum(lens, out=lens)
    i = np.ones(lens[-1], dtype=int)
    i[0] = start[0]
    i[lens[:-1]] += start[1:]
    i[lens[:-1]] -= end[:-1]
    np.cumsum(i, out=i)
    return i

This only creates a single temporary NumPy array (lens) and is much faster than any of the other solutions.

edited Jan 17 '11 at 06:20

answered Jan 16 '11 at 23:13

Sven Marnach

574,206
118
941
841

Thanks Sven, however, I find this is still a bit slower than the solution provided by Joe Kington. – Abiel Jan 16 '11 at 23:22
@Abiel: Did not have the time to time it yesterday. Here is another attempt :) – Sven Marnach Jan 17 '11 at 06:24
Sven - Thanks for your updated solution. This one delivered the best performance. – Abiel Jan 17 '11 at 19:25
@Abiel: It is also less obvious than the other solutions, either. But if you really need performance, you will have to use it :) – Sven Marnach Jan 17 '11 at 19:28

score 2 · Answer 2 · answered Jan 16 '11 at 19:57

2

Numpy's arange creates each individual sequence, so just string them together. How about this?

In [11]: idx = np.hstack([np.arange(s,e) for s,e in  zip(start, end)])

In [12]: idx
Out[12]: array([ 1,  2,  7,  8,  9, 20, 21, 22, 23, 24])

And then you can access somearray[idx].

answered Jan 16 '11 at 19:57

Andrew Jaffe

26,554
4
50
59

1

Equivalently, you can just do `np.hstack([np.r_[s:e] for s,e in zip(start, end)])`. It's a bit less readable, if you're not familiar with `numpy.r_`, though. – Joe Kington Jan 16 '11 at 20:00
Thanks Andrew and Joe. One question is, is there some way to do this while avoiding even the list comprehension? Certainly this is fast, but I still find that most of my function's runtime occurs on the line building up idx (the rest of the function uses numpy routines to identify start and end positions for windows that match a certain length criteria). – Abiel Jan 16 '11 at 21:54
Both these methods generate the intermediate arrays vs just generating the the integer indices of the arrays. Kinda wasteful if you are talking large arrays. – the wolf Jan 16 '11 at 22:13
Numpy can only deal with fixed-length arrays, so it's hard to imagine a (simple) version of the algorithm that doesn't go via an intermediate list. – Andrew Jaffe Jan 16 '11 at 22:44
True: The numpy array must be defined and fixed at the time of creation. However, the portion `[np.arange(s,e) for s,e in zip(start, end)]` generates 3 numpy arrays that are then stacked using `hstack` So you are creating 4 numpy arrays to end up with one. If, instead, you use slice syntax with `r_` or create one python list then you are only creating 1 or 2 lists in total... – the wolf Jan 17 '11 at 04:36

score 0 · Answer 3 · answered Jan 17 '11 at 00:42

0

How about this:

>>> import numpy as np
>>> start = np.array([1,7,20])
>>> end = np.array([3,10,25])
>>> na=np.fromiter(sum([range(s,e) for s,e in zip(start,end)],[]),np.int)
>>> na
array([ 1,  2,  7,  8,  9, 20, 21, 22, 23, 24])

The advantage is 1) no intermediate numpy float arrays; 2) resulting array is integer to most efficiently address other numpy arrays.

answered Jan 17 '11 at 00:42

dawg

98,345
23
131
206

Thanks drewk, unfortunately I found this an order of magnitude slower than solutions by Andrew/Joe/Sven. Operating over an array with 100,000 elements and many starting and ending points, Joe's solution executed slightly over 40x as fast. – Abiel Jan 17 '11 at 05:09
@Abiel: Thanks so much for the feedback! Sometimes writing these SO posts is a lot of work. Knowing that someone at least looked at it makes the effort worth it. – dawg Jan 18 '11 at 03:20

score -1 · Answer 4 · edited May 23 '17 at 12:04

You stated "alternatively you could say I have a list of starting positions and window lengths" which does not match your example array.

If start indicates the start and end is the length, you can get your elements this way:

>>> [i for iter in [range(s,s+e) for s,e in zip(start,end)] for i in iter]
[1, 2, 3, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 20, 21, 22, 23, 24, 25, 26, 
27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44]

If you want to match your example array, and end is really the ending element -1 you can get your elements this way:

>>> [i for iter in [range(*t) for t in zip(start,end)] for i in iter]
[1, 2, 7, 8, 9, 20, 21, 22, 23, 24]
>>> somearray=np.array(_)
>>> somearray
array([1, 2, 7, 8, 9, 20, 21, 22, 23, 24])

Alternative:

>>> sum([range(*t) for t in zip(start,end)],[])
[1, 2, 7, 8, 9, 20, 21, 22, 23, 24]

Keep in mind you are just generating a list of integers described in your tuples as an index to your numpy array. Any of these can use xrange vs range if that is faster / better in your case.

@Andrew Jaffe: He stated two things that were different and I answered both. — the wolf, Jan 16 '11 at 21:34

Construct Numpy index given list of starting and ending positions

4 Answers4

Linked