0

I'm experimenting with Python/Pandas using a DataFrame having the following structure:

import pandas as pd
import numpy as np

df = pd.DataFrame({"item" : ["A", "B", "C", "D", "E"], 
                   "size_ratio" : [0.3, 0.9, 1, 0.4, 0.7], 
                   "weight_ratio" : [0.5, 0.7, 1, 0.5, np.nan], 
                   "power_ratio" : [np.nan, 0.3, 0.5, 0.1, 1]})

print(df)

  item  size_ratio  weight_ratio  power_ratio
0    A         0.3           0.5          NaN
1    B         0.9           0.7          0.3
2    C         1.0           1.0          0.5
3    D         0.4           0.5          0.1
4    E         0.7           NaN          1.0

As you can see, each item is described by three normalized metrics, namely: size_ratio, weight_ratio, and power_ratio. Also, NaN values are possible for each metric.

My goal is to combine these metrics together to create a global score (S) for each row. Specifically, the function I would like to apply/implement is the following:

enter image description here

where

  • s_i are the individual scores;
  • w_i are user-defined weights associated to each metric;
  • alpha is a user-defined parameter (positive integer).

I want to be able to quickly adjust the weights and the parameter alpha to test different combinations.

As an example, setting w_1 = 3, w_2 = 2, w_3 = 1 and alpha = 5, the output should be the following:

  item  size_ratio  weight_ratio  power_ratio  global_score
0    A         0.3           0.5          NaN          0.36
1    B         0.9           0.7          0.3          0.88
2    C         1.0           1.0          0.5          0.99
3    D         0.4           0.5          0.1          0.44
4    E         0.7           NaN          1.0          0.70

Note that for the denominator, we only sum the weights associated to the non-missing metrics (same logic goes for the numerator).

Being relatively new to the Python programming language, I started by searching for answers here. In this post, I learned how to compute row-wise operation on a pandas DataFrame with missing values; and in this post, I saw an example where one uses a dictionary to set the weights.

Unfortunately, I was not able to apply what I found to my specific problem. Right now, I'm using Excel to make different simulations but I would very much like to experiment with this in Python. Any help would be greatly appreciated.

glpsx
  • 587
  • 1
  • 7
  • 21

1 Answers1

2

You could try something like this:

import pandas as pd
import numpy as np

def global_score(scores, weights, alpha):
    # if we have nan values remove them before calculating the score
    nan_vals = np.argwhere(np.isnan(scores))
    weights = np.delete(weights, nan_vals)
    scores = scores.dropna()
    # calculate the score
    numer = np.sum((scores * weights)**alpha)**(1/alpha)
    denom = np.sum((weights)**alpha)**(1/alpha)
    return numer/denom

weights = [3, 2, 1]
alpha = 5

df = pd.DataFrame({"item" : ["A", "B", "C", "D", "E"], 
                   "size_ratio" : [0.3, 0.9, 1, 0.4, 0.7], 
                   "weight_ratio" : [0.5, 0.7, 1, 0.5, np.nan], 
                   "power_ratio" : [np.nan, 0.3, 0.5, 0.1, 1]})

# only utilize the 3 score columns for the calculation 
df['global_score'] = df[['size_ratio','weight_ratio','power_ratio']].apply(lambda x: global_score(x, weights, alpha), axis=1)

the global_score function will drop any nan values prior to running the calculation. The apply function will apply the calculation to all rows when axis = 1. The apply function iterates over the rows and df[['size_ratio','weight_ratio','power_ratio']] makes sure only numeric columns of interest are passed to the global_score function.

Denver
  • 629
  • 4
  • 6