7

I have two identically-sized numpy.array objects (both one-dimensional), one of which contains a list of starting index positions, and the other of which contains a list of ending index positions (alternatively you could say I have a list of starting positions and window lengths). In case it matters, the slices formed by the starting and ending positions are guaranteed to be non-overlapping. I am trying to figure out how to use these starting and ending positions to form an index for another array object, without having to use a loop.

For example:

import numpy as np
start = np.array([1,7,20])
end = np.array([3,10,25])

Want to reference

somearray[1,2,7,8,9,20,21,22,23,24])
Abiel
  • 5,251
  • 9
  • 54
  • 74

4 Answers4

7

I would use

np.r_[tuple(slice(s, e) for s, e in zip(start, end))]

EDIT: Here is a solution that does not use a Python loop:

def indices(start, end):
    lens = end - start
    np.cumsum(lens, out=lens)
    i = np.ones(lens[-1], dtype=int)
    i[0] = start[0]
    i[lens[:-1]] += start[1:]
    i[lens[:-1]] -= end[:-1]
    np.cumsum(i, out=i)
    return i

This only creates a single temporary NumPy array (lens) and is much faster than any of the other solutions.

Sven Marnach
  • 574,206
  • 118
  • 941
  • 841
2

Numpy's arange creates each individual sequence, so just string them together. How about this?

In [11]: idx = np.hstack([np.arange(s,e) for s,e in  zip(start, end)])

In [12]: idx
Out[12]: array([ 1,  2,  7,  8,  9, 20, 21, 22, 23, 24])

And then you can access somearray[idx].

Andrew Jaffe
  • 26,554
  • 4
  • 50
  • 59
  • 1
    Equivalently, you can just do `np.hstack([np.r_[s:e] for s,e in zip(start, end)])`. It's a bit less readable, if you're not familiar with `numpy.r_`, though. – Joe Kington Jan 16 '11 at 20:00
  • Thanks Andrew and Joe. One question is, is there some way to do this while avoiding even the list comprehension? Certainly this is fast, but I still find that most of my function's runtime occurs on the line building up idx (the rest of the function uses numpy routines to identify start and end positions for windows that match a certain length criteria). – Abiel Jan 16 '11 at 21:54
  • Both these methods generate the intermediate arrays vs just generating the the integer indices of the arrays. Kinda wasteful if you are talking large arrays. – the wolf Jan 16 '11 at 22:13
  • Numpy can only deal with fixed-length arrays, so it's hard to imagine a (simple) version of the algorithm that doesn't go via an intermediate list. – Andrew Jaffe Jan 16 '11 at 22:44
  • True: The numpy array must be defined and fixed at the time of creation. However, the portion `[np.arange(s,e) for s,e in zip(start, end)]` generates 3 numpy arrays that are then stacked using `hstack` So you are creating 4 numpy arrays to end up with one. If, instead, you use slice syntax with `r_` or create one python list then you are only creating 1 or 2 lists in total... – the wolf Jan 17 '11 at 04:36
0

How about this:

>>> import numpy as np
>>> start = np.array([1,7,20])
>>> end = np.array([3,10,25])
>>> na=np.fromiter(sum([range(s,e) for s,e in zip(start,end)],[]),np.int)
>>> na
array([ 1,  2,  7,  8,  9, 20, 21, 22, 23, 24])

The advantage is 1) no intermediate numpy float arrays; 2) resulting array is integer to most efficiently address other numpy arrays.

dawg
  • 98,345
  • 23
  • 131
  • 206
  • Thanks drewk, unfortunately I found this an order of magnitude slower than solutions by Andrew/Joe/Sven. Operating over an array with 100,000 elements and many starting and ending points, Joe's solution executed slightly over 40x as fast. – Abiel Jan 17 '11 at 05:09
  • @Abiel: Thanks so much for the feedback! Sometimes writing these SO posts is a lot of work. Knowing that someone at least looked at it makes the effort worth it. – dawg Jan 18 '11 at 03:20
-1

You stated "alternatively you could say I have a list of starting positions and window lengths" which does not match your example array.

If start indicates the start and end is the length, you can get your elements this way:

>>> [i for iter in [range(s,s+e) for s,e in zip(start,end)] for i in iter]
[1, 2, 3, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 20, 21, 22, 23, 24, 25, 26, 
27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44]

If you want to match your example array, and end is really the ending element -1 you can get your elements this way:

>>> [i for iter in [range(*t) for t in zip(start,end)] for i in iter]
[1, 2, 7, 8, 9, 20, 21, 22, 23, 24]
>>> somearray=np.array(_)
>>> somearray
array([1, 2, 7, 8, 9, 20, 21, 22, 23, 24])

Alternative:

>>> sum([range(*t) for t in zip(start,end)],[])
[1, 2, 7, 8, 9, 20, 21, 22, 23, 24]

Keep in mind you are just generating a list of integers described in your tuples as an index to your numpy array. Any of these can use xrange vs range if that is faster / better in your case.

Community
  • 1
  • 1
the wolf
  • 34,510
  • 13
  • 53
  • 71