0

I am trying to split a dataframe, create a new variable in each dataframe list object, and reassemble (unsplit) the original dataframe.

The new variable I am trying to create scales the variable B.2 from 0 to 1 for each factor level in the variable Type.

BWRX$B.2 <- BWRX$B #Create a new version of B
BWRX.Split <- split(BWRX, BWRX$Type) #Split by Type
BWRX.Split.BScaled <-lapply(BWRX.Split, function(df){df$B.3 <- (df$B.2-min(df$B.2))/(max(df$B.2)-min(df$B.2))}) #Scale B.2

The above code returns a list with the values of B.2 correctly scaled within each factor level. The tricky part is that I cannot figure out how to add this variable to each dataframe in BWRX.Split.

I thought df$B.3 would correct for this, but it has not. Once B.3 is a part of each dataframe can unsplit(, Type) be used to reassemble the dataframes or would do.call be better? I was trying to combine unsplit and split so everything would be in one line to code. Is there a more efficient method?

RTrain3k
  • 845
  • 1
  • 13
  • 27
  • Please provide a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) to make it clear what the input is and what is the desired output is. – MrFlick Aug 11 '16 at 02:30

2 Answers2

1

We don't really need to split it, this can be done using ave from base R. The advantage is that the new column will added in the same order as in the original row order of the dataset.

transform(BWRX, BScaled = ave(B.2, Type, 
        FUN = function(x) (x- min(x))/(max(x)- min(x))))

This is a group by operation. So, it can be efficiently done with data.table or dplyr

library(data.table)
setDT(BWRX)[, BScaled := (B.2 - min(B.2))/(max(B.2) - min(B.2)), by = Type]
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Would there be any reason to use `transform()` instead of `data.frame()`? The [R documentation](https://stat.ethz.ch/R-manual/R-devel/library/base/html/transform.html) for `transform()` is pretty ominous. – RTrain3k Aug 11 '16 at 16:48
  • @user3290799 It is just to the return the output without changing the original object. If you assign it to the original object `BWRX`, it will reflect that in the object – akrun Aug 12 '16 at 03:59
0

As you mentioned and MrFlick confirmed, you can simply unsplit() it:

BWRX$B.3 <- unsplit(BWRX.Split.BScaled,BWRX$Type)

To do this in a single line:

BWRX$B.3 <- unsplit(lapply(split(BWRX$B.2, BWRX$Type), function(x)(x-min(x))/(max(x)-min(x))),BWRX$Type)

But Akrun's solutions are both quicker

HubertL
  • 19,246
  • 3
  • 32
  • 51