1

I have a numpy array with shape:

In: imar.shape 
Out: (21, 77, 10000)

I want a binned sum on the last axis, with every bin containing 20 items.

The way I'm doing this now is:

np.sum(  imar.reshape([-1,500,20]), axis=2 ).reshape(imar.shape[:2])

It's fast, but seems error-prone if I get the arguments to reshape wrong. Is there a better way to do this?

I've looked at np.digitize,histogram,bincount, and some others, but those are value based; I want sum over a set of ranges.

Isaiah Norton
  • 4,205
  • 1
  • 24
  • 38
  • 1
    if you are concerned about getting the arguments wrong, why cannot you just make a function of it ? – David Cournapeau Apr 13 '11 at 22:31
  • replace `500` with `imar.shape[-1]/20` and assert `imar.shape[-1]%20` is zero and I think you are solid. You can speed things up further by doing `imar.shape = (x,y,z)` rather than calling the more expensive `reshape` – Paul Apr 13 '11 at 22:40
  • Thanks, I'll write a function along the lines of "binsum(array, axis=?, bins=[?])"... I guess I was hoping there was some more elegant way because the reshape method seems ugly to me. Thanks. – Isaiah Norton Apr 14 '11 at 01:21

1 Answers1

1

You have the right approach. I asked a similar question a while back:

How can I efficiently process a numpy array in blocks similar to Matlab's blkproc (blockproc) function

There are several approaches to handling the reshape. If you are careful and write a function to do it, you'll be alright. Of course, you need to be certain that you trim your input matrix if it isn't an integer multiple of your block size.

Community
  • 1
  • 1
Carl F.
  • 6,718
  • 3
  • 28
  • 41
  • Thanks, the linked answer is really helpful and a great general solution. Plus I learned a lot reading the links about strides! – Isaiah Norton Apr 15 '11 at 04:08