0

I have the following Dataframe:

Index Plassering Average FGrating
22943 1 100.43
22944 2 93.5
22945 3 104.6
22746 4 101.3
22947 1 102.05
22948 2 107.35
22949 3 109.12

I am trying to apply the softmax function for the Average FGrating column, for the entire DataFrame, while the Plassering values are increasing. This means that I want to apply softmax for the first four rows in the DataFrame, then for the next 3 rows, separately, and so on.

The entire DataFrame, having about 5000 rows, is structured like this.

My first attempt is to cycle through the rows of this DataFrame, using iterrows() and, while Plassering is increasing, the Average FGrating value is added to a list. When the Plassering value is smaller that the value from the previous row, I compute the softmax passing the list as a parameter, then empty the list and the cycle goes on. However, I read here that it is not a good idea, performance-wise.

Do you have any better ideas than mine?

Bogdan Doicin
  • 2,342
  • 5
  • 25
  • 34

2 Answers2

1

Basing on consecutive differences of Plassering values (cumulated as separate groups) and pandas.core.groupby.DataFrameGroupBy.transform operation:

from scipy.special import softmax

df['soft_max'] = (df.groupby(df['Plassering'].diff().ne(1).cumsum())
                  ['Average FGrating'].transform(softmax)) 

   Index  Plassering  Average FGrating  soft_max
0  22943           1            100.43  0.014684
1  22944           2             93.50  0.000014
2  22945           3            104.60  0.950254
3  22746           4            101.30  0.035048
4  22947           1            102.05  0.000726
5  22948           2            107.35  0.145437
6  22949           3            109.12  0.853837
RomanPerekhrest
  • 88,541
  • 4
  • 65
  • 105
1

You can use a groupby transformation. First generate the groups, then apply your softmax:

import pandas as pd
from scipy.special import softmax

df = pd.read_clipboard() # Your df here

groups = df["Plassering"].diff().lt(0).cumsum()
out = df["Average FGrating"].groupby(groups).transform(softmax)

results:

>>> groups
0    0
1    0
2    0
3    0
4    1
5    1
6    1
Name: Plassering, dtype: int32
>>> out
0    0.014684
1    0.000014
2    0.950254
3    0.035048
4    0.000726
5    0.145437
6    0.853837
Name: Average FGrating, dtype: float64
Chrysophylaxs
  • 5,818
  • 3
  • 10
  • 21
  • 1
    Thank you both you and @RomanPerekhrest for your answers. I chose yours because it's a bit easier to understand by me. – Bogdan Doicin Apr 22 '23 at 14:05