1

I'm trying to normalize a variable (using the minimum and maximum values) according to a second, variable (a factor).

It'll be clearer using the diamonds dataframe as an example.

This normalizes the carat variable to the 0-1 interval:

di <- diamonds
di$caratn <- (di$carat-min(di$carat))/(max(di$carat)-min(di$carat))

But I would like to do the normalization according to the clarity variable (which is a factor). That is, taking all carat values of a given clarity and normalizing 0-1.

The result would be that the highest carat of clarity SI2 would have a value of 1, and the same thing for the other clarities.

Rashwan L
  • 38,237
  • 7
  • 103
  • 107
xgrau
  • 299
  • 1
  • 2
  • 11

1 Answers1

1

Here's a solution using ave():

di <- within(di,caratn <- ave(carat,clarity,FUN=function(x) (x-min(x))/diff(range(x))))
Sam Dickson
  • 5,082
  • 1
  • 27
  • 45