I have the following pandas dataframe.
import pandas as pd
import numpy as np
df = pd.DataFrame(
{
"bird_type": ["falcon", "crane", "crane", "falcon"],
"avg_speed": [np.random.randint(50, 200) for _ in range(4)],
"no_of_birds_observed": [np.random.randint(3, 10) for _ in range(4)],
"reliability_of_data": [np.random.rand() for _ in range(4)],
}
)
# The dataframe looks like this.
bird_type avg_speed no_of_birds_observed reliability_of_data
0 falcon 66 3 0.553841
1 crane 159 8 0.472359
2 crane 158 7 0.493193
3 falcon 161 7 0.585865
Now, I would like to have the weighted average (according to the number_of_birds_surveyed) for the average_speed and reliability variables. For that I have a simple function as follows, which calculates the weighted average.
def func(data, numbers):
ans = 0
for a, b in zip(data, numbers):
ans = ans + a*b
ans = ans / sum(numbers)
return ans
How can I apply the function of func
to both average speed and reliability variables?
I expect the answer to be a dataframe like follows
bird_type avg_speed no_of_birds_observed reliability_of_data
0 falcon 132.5 10 0.5762578
# how (66*3 + 161*7)/(3+7) (3+10) (0.553841×3+0.585865×7)/(3+7)
1 crane 158.53 15 0.4820815
# how (159*8 + 158*7)/(8+7) (8+7) (0.472359×8+0.493193×7)/(8+7)
I saw this question, but could not generalize the solution / understand it completely. I thought of not asking the question, but according to this blog post by SO and this meta question, with a different example, I think this question can be considered a "borderline duplicate". An answer will benefit me and probably some others will also find this useful. So finally decided to ask.