I have a dataframe that is similar to:
I would like to calculate the median age for each city but given that it is a frequency table I'm finding it somewhat tricky. Is there any function in pandas or other that would help me achieve this?
I have a dataframe that is similar to:
I would like to calculate the median age for each city but given that it is a frequency table I'm finding it somewhat tricky. Is there any function in pandas or other that would help me achieve this?
For each row, find the number of instances there are. Then take that number, divide by 2, and determine what age that would be by checking if the number of people have the age smaller than what we are looking for.
For example, for the row 'alabama', you would add 34 + 67 + ... + 23 = 5463. That, divided by 2, would be 2731.5 ==> 2731. Then, checking each age group, determine where the 2731th person would be.
Do this repeatedly for each city/state, and you should get the median for each one.
Maybe this works for you:
import numpy as np
import pandas as pd
# create dataframe
df = pd.DataFrame(
[
['Alabama', 34, 67, 89, 89, 67, 545, 4546, 3, 23],
['Georgia', 345, 65, 67, 32, 23, 567, 87, 647, 68]
],
columns=['City', 0, 1, 2, 3, 4, 5, 6, 7, 8]
).set_index('City')
print(df)
# calculate median for freq table
m = list() # median list
for index, row in df.iterrows():
v = list() # value list
z = zip(row.index, row.values)
for item in z:
for f in range(item[1]):
v.append(item[0])
m.append(np.median(v))
df_m = pd.DataFrame({'City': df.index, 'Median': m})
print(df_m)
Input:
0 1 2 3 4 5 6 7 8
City
Alabama 34 67 89 89 67 545 4546 3 23
Georgia 345 65 67 32 23 567 87 647 68
Output:
City Median
0 Alabama 6.0
1 Georgia 5.0