0

I have a dataframe with some columns. I'd like to apply some transformation to one column and use it as a weight for computing a weighted sum of the other columns. The issue is the way I'm doing it is currently taking too long. Is there a faster way to do this?

I'm currently calculating a new column, transposing, and using df.dot as suggested by almost all answers. The issue is that I have an extremely large dataframe and so this method is taking a long time.

For example, given the following df

col1  col2  col3
 0.1   0.2   0.3
 1.4   1.5   1.6
 1.9   1.8   1.7

I create a new column, weights, that is 1/col3

col1  col2  col3  weight
 0.1   0.2   0.3   3.333
 1.4   1.5   1.6   0.625
 1.9   1.8   1.7   0.588

and then I transpose and df.dot against the weight to get

col1  col2
2.32  2.66
2easy
  • 9
  • 5

1 Answers1

0

I check linked answers and there is not used np.dot, but DataFrame.dot, I hope this should be faster, but if use large DataFrames without huge RAM, it should be still slow:

w = 1 / df.col3
arr = np.dot(df.to_numpy().T, w.to_numpy())

df1 = pd.DataFrame([arr], columns=df.columns)
print (df1)
      col1     col2  col3
0  2.32598  2.66299   3.0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252