Here I have a large dataset
For simplicity, I write a simplified example
import pandas as pd
import numpy as np
indices = (1,2,3,4,5,6)
cols = ["id", "region", "weight", "score1", "score2"]
data = (["xxx1", 1, 2, 10, 20], ["xxx2", 2, 5, 6, 8], ["xxx3", 1, 3, 9, 12], ["xxx4", 1, 3, 12, 20], ["xxx5", 1, 5, 5, 30], ["xxx6", 2, 10, 12, 20])
df = df = pd.DataFrame(data, index = indices, columns = cols)
df
which looks like this
I want to calculate the weighted average score using pandas and numpy packages only, and my ideal result should be a 2x3 table, whose rows are regions they belong to and columns are the weighted average score1 and score2. Below is my attempt
def cal(x, w):
assert len(x) == len(w)
nlist = len(x)
sum_weight = np.sum(w)
weight_total = 0
for i in range(nlist):
weight_total = weight_total + x[i] * w[i]
wavg = weight_total / sum_weight
return wavg
However, I think this method is too awkward. I am wondering if there is more convenient way to achieve the same goal. Any hint or help is welcome
the ideal result