How to compute a median and percentiles in numpy on summarised (observation, number_of_responses) data?

Asked Mar 18 '18 at 23:23

Active Mar 18 '18 at 23:23

Viewed 200 times

The documented percentile and median methods work on raw, unprocessed data:

raw_data = [1,1,1,1,2,3,4,4,5,5]  # observations

a = np.array(raw_data)
np.median(a)

a = np.array(raw_data)
np.percentile(a, 50)

How to achieve the same outcome using summarized_data such as below (without expanding it first):

summarised_data = [[1,4],[2,1],[3,1],[4,2],[5,2]]  # [[observation, number_of_responses], [...]]

That is, without performing the equivalent of:

data = [n for i in range(len(summarised_data)) for n in [summarised_data[i][0]] * summarised_data[i][1]]

asked Mar 18 '18 at 23:23

Greg

medians are here: https://stackoverflow.com/questions/20601872/numpy-or-scipy-to-calculate-weighted-median – Stephen Rauch Mar 18 '18 at 23:44
Seems like you're asking to find the median given a histogram with bins of size 1. This may help, but frankly might be more work than the "expanding" you're talking about. https://math.stackexchange.com/q/879052 – Brad Solomon Mar 18 '18 at 23:51

0 Answers0