4

I have a 1D numpy array, and some offset/length values. I would like to extract from this array all entries which fall within offset, offset+length, which are then used to build up a new 'reduced' array from the original one, that only consists of those values picked by the offset/length pairs.

For a single offset/length pair this is trivial with standard array slicing [offset:offset+length]. But how can I do this efficiently (i.e. without any loops) for many offset/length values?

Thanks, Mark

Mark
  • 1,333
  • 1
  • 14
  • 21
  • So, ideally speaking, what would you end up with? A 2D array? – Henry Gomersall Jun 16 '12 at 08:40
  • No, again with a 1D array, which just consists of the values picked out from the original 1D array based on the offset/lentgh values. – Mark Jun 16 '12 at 08:45
  • I take it `offset/lentgh values` is as an array of some sort or do you simply want to partition your array into a set of smaller arrays. – Samy Vilar Jun 16 '12 at 08:47
  • yes, offset/length are arrays. I do not really want to partition since I want to have one 1D array in the end. So I would need the concatenation of the partial smaller arrays you mention. but all without loops. – Mark Jun 16 '12 at 08:50

2 Answers2

6
>>> import numpy as np
>>> a = np.arange(100)
>>> ind = np.concatenate((np.arange(5),np.arange(10,15),np.arange(20,30,2),np.array([8])))
>>> a[[ind]]
array([ 0,  1,  2,  3,  4, 10, 11, 12, 13, 14, 20, 22, 24, 26, 28,  8])
fraxel
  • 34,470
  • 11
  • 98
  • 102
  • 1
    On a side note, `np.r_` is quite nice for what you're doing with `concatenate`. Your long concatenate line reduces to `ind = np.r_[:5, 10:15, 20:30:2, 8]` – Joe Kington Jun 16 '12 at 13:05
5

There is the naive method; just doing the slices:

>>> import numpy as np
>>> a = np.arange(100)
>>> 
>>> offset_length = [(3,10),(50,3),(60,20),(95,1)]
>>>
>>> np.concatenate([a[offset:offset+length] for offset,length in offset_length])
array([ 3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 50, 51, 52, 60, 61, 62, 63,
       64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 95])

The following might be faster, but you would have to test/benchmark.

It works by constructing a list of the desired indices, which is valid method of indexing a numpy array.

>>> indices = [offset + i for offset,length in offset_length for i in xrange(length)]
>>> a[indices]
array([ 3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 50, 51, 52, 60, 61, 62, 63,
       64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 95])

It's not clear if this would actually be faster than the naive method but it might be if you have a lot of very short intervals. But I don't know.

(This last method is basically the same as @fraxel's solution, just using a different method of making the index list.)


Performance testing

I've tested a few different cases: a few short intervals, a few long intervals, lots of short intervals. I used the following script:

import timeit

setup = 'import numpy as np; a = np.arange(1000); offset_length = %s'

for title, ol in [('few short', '[(3,10),(50,3),(60,10),(95,1)]'),
                  ('few long', '[(3,100),(200,200),(600,300)]'),
                  ('many short', '[(2*x,1) for x in range(400)]')]:
  print '**',title,'**'
  print 'dbaupp 1st:', timeit.timeit('np.concatenate([a[offset:offset+length] for offset,length in offset_length])', setup % ol, number=10000)
  print 'dbaupp 2nd:', timeit.timeit('a[[offset + i for offset,length in offset_length for i in xrange(length)]]', setup % ol, number=10000)
  print '    fraxel:', timeit.timeit('a[np.concatenate([np.arange(offset,offset+length) for offset,length in offset_length])]', setup % ol, number=10000)

This outputs:

** few short **
dbaupp 1st: 0.0474979877472
dbaupp 2nd: 0.190793991089
    fraxel: 0.128381967545
** few long **
dbaupp 1st: 0.0416231155396
dbaupp 2nd: 1.58000087738
    fraxel: 0.228138923645
** many short **
dbaupp 1st: 3.97210478783
dbaupp 2nd: 2.73584890366
    fraxel: 7.34302687645

This suggests that my first method is the fastest when you have a few intervals (and it is significantly faster), and my second is the fastest when you have lots of intervals.

huon
  • 94,605
  • 21
  • 231
  • 225
  • this is what I am looking for, but is there a way to get it without the for loop? – Mark Jun 16 '12 at 08:55
  • 1
    @MarkVogelsberger, are you trying to remove the for loop for performance reasons? If so, you should test these (and fraxel's) to see if they are fast enough, so that you can avoid doing unnecessarily micro-optimisation: only if none of these are quick enough should you worry about completely removing the for loops. – huon Jun 16 '12 at 08:58
  • @MarkVogelsberger, I've added some performance statistics to my answer. – huon Jun 16 '12 at 09:30