Reduce the size of a numpy array while preserving the information in it

Question

I'm a newbie at python and I'm trying to do something like binning the data of a numpy array. I'm really struggling in doing so, tho!

My array is a simulation of a simple particle diffusion model, given their probabilities of walking forward or backward. It can have an arbitrary number of species of particles and the total number of particles and that information is coded in the key vector, which is a vector composed of numbers ranging from 0 to nSpecies. Each of these numbers appears according to a given proportion chosen by the user. The size of the vector is chosen by the user as well.

def walk(diff, key, progressProbability, recessProbability, nSpecies):
"""
Returns an array with the positions of the particles pondered by their
walk probabilities
"""
random = np.random.rand(len(key))
forward = key.astype(float)
backward = key.astype(float)


for i in range(nSpecies):
    forward[key == i] = progressProbability[i]
    backward[key == i] = recessProbability[i]


diff = np.add(diff, random < forward)
diff = np.subtract(diff, random > 1 - backward)

return diff

To add time into this simulation, I run this walk function presented above many times. Therefore, the values in diff after running this function many times are a representation of how far the particle has gone.

def probability_diffusion(time, progressProbability, recessProbability, 
                          changeProbability, key, nSpecies, nBins):

populationSize = len(key)

diff = np.zeros(populationSize, dtype= int)

for t in range(time):
    diff = walk(diff, key, progressProbability, recessProbability, nSpecies)

return diff

My goal is to turn this diff array in a array with size 381 without losing the information coded in it. I thought about doing so by binning and averaging the data in each bin.

I've tried using the scipy binned_statistic function but I can't really wrap my head around how it works.

Any thoughts? Thank you.

Welcome to SO. Please read [How to create a Minimal, Complete, and Verifiable example](https://stackoverflow.com/help/mcve) and [How do I ask a good question?](https://stackoverflow.com/help/how-to-ask) — Mr. T, Mar 02 '18 at 17:06
I've added more info about the context of the problem and further clarification of the question to hopefully make it more understandable! @Avery — Bernardo Maciel, Mar 02 '18 at 17:46
Could you give an example input and output to the specific problem. Not to the code above, but more like. "I have a `381x381` sparse matrix that I would like to compress into the form `NxN` with these characteristics. Here is what I have tried" — KDecker, Mar 02 '18 at 18:21
@KDecker, not sure I understood, but basically I have an array len 1000 and I want to compress it into an array with len 381 where the old consecutive indexes that had to be "compressed" into one index in the new vector are some kind of mean value or median of them all. Thank you for your help! — Bernardo Maciel, Mar 02 '18 at 18:38
Basically you just gave me/us way too much information. We just wanted to see input, output, and requirements. Maybe an SSCCE. // Given your last comment, have a look here https://stackoverflow.com/questions/28663856/how-to-count-the-occurrence-of-certain-item-in-an-ndarray-in-python . I think this is what you want. — KDecker, Mar 02 '18 at 18:41
So the whole question is “How can I resample a histogram into a smaller number of bins?”? Except that each element of `diff` here is the _identity_ of some particle. How does it make sense to reduce to some smaller number of particles, rather than grouping them by species and computing statistics on their distributions? — Davis Herring, Mar 03 '18 at 02:24
@DavisHerring The answer to your question would be a little more technical into the context of the problem but, basically, the simulation needs to be run with a big number of elements in order to be significative. Furthermore, to compare and optimize the parameters of my simulation (such as the probabilities of conversion, the proportions of the various types of particles and the probabilities of moving forward) according to my current implementation, I need to do some vector algebra and my simulation that requires the data and the simulation having the same size. Hence the binning issue. — Bernardo Maciel, Mar 03 '18 at 03:06

Reduce the size of a numpy array while preserving the information in it

0 Answers0