You have a frequency table. You are interested in finding the first value from x[:, 0]
corresponding to where the midpoint falls on the cumulative frequency.
You can use:
def median_freq_table(freq_table: np.ndarray) -> float:
"""
Find median of an array represented as a frequency table [[ val, freq ]].
"""
values = freq_table[:, 0]
freqs = freq_table[:, 1]
# cumulative frequencies
cf = np.cumsum(freqs)
# total number of elements
n = cf[-1]
# get the left and right buckets
# of where the midpoint falls,
# accounting for both even and odd lengths
l = (n // 2 - 1) < cf
r = (n // 2) < cf
# median is the midpoint value (which falls in the same bucket)
if n % 2 == 1 or (l == r).all():
return values[r][0]
# median is the mean between the mid adjacent buckets
else:
return np.mean(values[l | r][:2])
Your input:
>>> xs = np.array(
[
[10000, 329],
[20000, 329],
[30000, 323],
[40000, 310],
[50000, 284],
[60000, 232],
[70000, 189],
[80000, 130],
[90000, 87],
[100000, 71],
]
)
>>> median_freq_table(xs)
40000
Simple, even-length array:
>>> xs = np.array([[1, 3], [10, 3]])
>>> median_freq_table(xs)
5.5