0

I have a one dimensional array of boolean value that I am trying to bin (averager over larger bins) in a way that if a bin is True if any of the values inside is True.

I have been trying to do it in the fashion of https://stackoverflow.com/a/21712136/3275464

import numpy as N
data = N.random.randint(2,size=100).astype(bool) #generating array of random booleans
bins = N.linspace(0,100,11,1).astype(int) #the array containing the bins
binned_data = N.logical_or.reduceat(data,bins[:-1])

but the last line gives me the following error:

 TypeError: array cannot be safely cast to required type

It seems to me that it should work just as it does with averaging.

I am using numpy 1.6.2 by the way.

Do you see where I am committing a mistake?

Community
  • 1
  • 1
Learning is a mess
  • 7,479
  • 7
  • 35
  • 71
  • Are the bins always uniform in size or can the size vary between bins? – TheBlackCat Jul 28 '15 at 13:55
  • Is `slices` the same as `bins`? (If so, your code runs without error using NumPy version 1.8.2). – unutbu Jul 28 '15 at 13:55
  • @TheBlackCat: they are all the same size except for the last one which may be shorter if `len(data)` not divisible by the bin width. the last bin can be shorter that the other ones – Learning is a mess Jul 28 '15 at 14:00
  • @unutbu: Yes, they are the same, just corrected, thank you. Okay then it may be a bug in my numpy version but I can't upgrade it sadly. Will have to go for another option! – Learning is a mess Jul 28 '15 at 14:01

2 Answers2

2

If the bins are always the same size, the simplest approach would be to reshape and then use any to find if any value in a row is True:

import numpy as np
data = np.random.randint(2,size=100).astype('bool')
binned_data = data.reshape((5, -1)).any(axis=0)

If the bins aren't always the same size, the simplest approach would be to get the indexes of True values, then do a histogram, then find all the nonzero histogram bins:

import numpy as np
data = np.random.randint(2,size=100).astype('bool')
bins = np.linspace(0,100,11,1)
inds = np.where(data)[0]
binned_data = np.histogram(inds, bins=bins)[0].astype('bool')
TheBlackCat
  • 9,791
  • 3
  • 24
  • 31
2

In NumPy 1.8.2 your code works fine.

Since the error indicates a problem with safely casting the values to the required type, the problem seems to be related to the values in data being bools.

Therefore, a work around for earlier versions might be to cast the bools as ints, do the reduceat computation on ints, and then recast as bools:

import numpy as np
np.random.seed(2015)

data = np.random.binomial(1, 0.1, size=100).astype(bool) 
bins = np.linspace(0,100,11,1).astype(int) 

expected = np.logical_or.reduceat(data,bins[:-1])
print(expected)
# [ True False  True  True False  True  True False False  True]

binned_data = np.add.reduceat(data.astype('int'), bins[:-1]).astype(bool)
print(binned_data)
# [ True False  True  True False  True  True False False  True]

assert (expected == binned_data).all()
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677