-1

I have data that looks like this:

x: 0, 1, 2, 3, 4,...
y: 1, 3, 1, 4, 2,...

Where y is the frequency list for each element in x.

Firstly I want to get the data showed as the following:

data: 0, 1, 1, 1, 2, 3, 3, 3, 3, 4, 4,...

I would like to extract the mean and the median of these values. So the mean should be 2.27 and the median should be 3.

I'm wondering what would be the best way to do, whether to create a dictionary with x and y values or something else.

Thanks in advance.

cs95
  • 379,657
  • 97
  • 704
  • 746
Maxwell's Daemon
  • 587
  • 1
  • 6
  • 21

1 Answers1

3

Those data elements look to be the values from x repeated by the corresponding element in y. Thus, we could use np.repeat -

data = np.repeat(x,y)

Then, simply get the mean and median values with their ufuncs : np.mean(data) and np.median(data).

Alternatively, an efficient way to get the mean value would be to perform inner-product between x and yand divide by the total sum of y -

np.inner(x,y)/float(y.sum())

Sample run -

In [57]: x
Out[57]: array([0, 1, 2, 3, 4])

In [58]: y
Out[58]: array([1, 3, 1, 4, 2])

In [59]: data = np.repeat(x,y)

In [65]: data
Out[65]: array([0, 1, 1, 1, 2, 3, 3, 3, 3, 4, 4])

In [60]: np.mean(data)
Out[60]: 2.2727272727272729

In [61]: np.median(data)
Out[61]: 3.0

In [62]: np.inner(x,y)/float(y.sum())
Out[62]: 2.2727272727272729
Divakar
  • 218,885
  • 19
  • 262
  • 358