2

I have a big continuous array of values that ranges from (-100, 100)

Now for this array I want to calculate the weighted average described here

since it's continuous I want also to set breaks for the values every 20 i.e the values should be discrete as -100 -80 -60 .... 60 80 100

How can I do this in NumPy or python in general?

EDIT: the difference here from the normal mean, that the mean is calculated according to the frequency of values

Andrew Raafat
  • 33
  • 1
  • 1
  • 5
  • possible duplicate of [Calculating arithmetic mean (average) in Python](http://stackoverflow.com/questions/7716331/calculating-arithmetic-mean-average-in-python) – David Greydanus May 05 '15 at 15:25
  • Could you please explain what you mean by breaks? – David Greydanus May 05 '15 at 15:27
  • 2
    Already implemented in `numpy` as `average`. Check [here](http://docs.scipy.org/doc/numpy/reference/generated/numpy.average.html) – Ashish May 05 '15 at 15:36
  • well that functions require weights to be an already defined list which is not provided in my problem since the values are continuous. you can check @PascalvKooten solution, it's pretty neat. – Andrew Raafat May 05 '15 at 21:15

2 Answers2

4

You actually have 2 different questions.

  1. How to make data discrete, and
  2. How to make a weighted average.

It's usually better to ask 1 question at a time, but anyway.

Given your specification:

xmin = -100
xmax = 100
binsize = 20

First, let's import numpy and make some data:

import numpy as np
data = numpy.array(range(xmin, xmax))

Then let's make the binnings you are looking for:

bins_arange = numpy.arange(xmin, xmax + 1, binsize) 

From this we can convert the data to the discrete form:

counts, edges = numpy.histogram(data, bins=bins_arange)

Now to calculate the weighted average, we can use the binning middle (e.g. numbers between -100 and -80 will be on average -90):

bin_middles = (edges[:-1] + edges[1:]) / 2

Note that this method does not require the binnings to be evenly "spaced", contrary to the integer division method.

Then let's make some weights:

weights = numpy.array(range(len(counts)) / sum(range(len(counts))

Then to bring it all together:

average =          np.sum(bin_middles * counts * 1) / sum(counts)
weighted_average = np.sum(bin_middles * counts * weights) / sum(counts)
PascalVKooten
  • 20,643
  • 17
  • 103
  • 160
0

For the discretization (breaks), here is a method using the python integer division :

import numpy as np
values = np.array([0, 5, 10, 11, 21, 24, 48, 60])
(values/20) *20
# or (a/10).astype(int)*10 to force rounding

that will print :

aarray([ 0,  0,  0,  0, 20, 20, 40, 60])

For the weighted mean, if you have another array with the weights for each point, you can use :

weighted_means = sum([ w*v for w,v in zip(weights, values)]) / sum( w*w )
stellasia
  • 5,372
  • 4
  • 23
  • 43