5

I find it hard to explain with words what I want to achieve, so please don't judge me for showing a simple example instead. I have a table that looks like this:

main_col some_metadata value
this True 10
this False 3
that True 50
that False 10
other True 20
other False 5

I want to normalize this data separately for each case of main_col. For example, if we're to choose min-max normalization and scale it to range [0; 100], I want the output to look like this:

main_col some_metadata value (normalized)
this True 100
this False 30
that True 100
that False 20
other True 100
other False 25

Where for each case of main_col, the highest value is scaled to 100 and another value is scaled in respective proportion.

Max Skoryk
  • 404
  • 2
  • 10

2 Answers2

5

You can use groupby.transform('max') to get the max per group, then normalize in place:

df['value'] /= df.groupby('main_col')['value'].transform('max').div(100)

or:

df['value'] *= df.groupby('main_col')['value'].transform('max').rdiv(100)

output:

  main_col  some_metadata  value
0     this           True  100.0
1     this          False   30.0
2     that           True  100.0
3     that          False   20.0
4    other           True  100.0
5    other          False   25.0
mozway
  • 194,879
  • 13
  • 39
  • 75
  • 1
    I've just realized both of our solutions assume the numbers are positive. In case of any negative numbers, the result can be highly skewed. For instance if `this False` is `-10` instead of `3` the result will be `-100` instead of `0`. – Nuri Taş Sep 27 '22 at 14:12
  • 1
    @NuriTaş I don't think this should be the case as it's not a min/max scaling but a max scaling, 0 being the implicit min. – mozway Sep 27 '22 at 14:13
  • @nuri-taş that's true. But if there were negative values it's not the implementation that would be wrong, this whole normalization just wouldn't make any sense. – Max Skoryk Sep 27 '22 at 14:37
1

The normalization formula you are looking for is 100 * (x / x.max()):

df.groupby(['main_col'])['value'].transform(lambda x: 100 * (x / x.max()))

Result:

0    100.0
1     30.0
2    100.0
3     20.0
4    100.0
5     25.0
Name: value, dtype: float64
Nuri Taş
  • 3,828
  • 2
  • 4
  • 22