7

I have a dataframe similar to this one

ID <- c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3)
p1 <- c(21000, 23400, 26800, 2345, 23464, 34563, 456433, 56543, 34543,3524, 353, 3432, 4542, 6343, 4534 )
p2 <- c(234235, 2342342, 32, 23432, 23423, 2342342, 34, 2343, 23434, 23434, 34, 234, 2343, 34, 5)
my.df <- data.frame(ID, p1, p2)

Now I would like to scale the values in p1 and p2 depending on their ID. So not the whole column would be scaled like when using the tapply() function, but rather scaling is done once for all values for ID 1, then for all values for ID 2 etc. Same for scaling of p2. The new dataframe should consist of the scaled values.

I already tried

df_scaled <- ddply(my.df, my.df$ID, scale(my.df$p1))

but get the error message

.fun is not a function.

Thanks for your help!

GNee
  • 147
  • 1
  • 2
  • 10

1 Answers1

4

dplyr makes this easy:

ID <- c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3)
p1 <- c(21000, 23400, 26800, 2345, 23464, 34563, 456433, 56543, 34543,3524, 353, 3432, 4542, 6343, 4534 )
p2 <- c(234235, 2342342, 32, 23432, 23423, 2342342, 34, 2343, 23434, 23434, 34, 234, 2343, 34, 5)
my.df <- data.frame(ID, p1, p2)

library(dplyr)
df_scaled <- my.df %>% group_by(ID) %>% mutate(p1 = scale(p1), p2=scale(p2))

Note that there is a bug in the stable version of dplyr when working with scale; you might need to update to the dev version (see comments).

mpjdem
  • 1,504
  • 9
  • 14
  • or more generic `my.df %>% group_by(ID) %>% mutate_at(vars(matches('p')), funs(scale))` – Sotos Jan 20 '17 at 10:17
  • Thank you. It works on the dataframe I put up here as an example but with the real dataframe I get the Error: unexpected '=' in "scaled_data <- predictortable_panel %>% group_by(predictortable_panel$ID) %>% mutate(predictortable_panel$p1 =" --- any idea why it won't take the equal sign? – GNee Jan 20 '17 at 10:30
  • You shouldn't repeat the name of the data frame inside the `dplyr `functions (i.e. remove `predictortable_panel$`); `mutate(p1=...` etc should work. – mpjdem Jan 20 '17 at 10:54
  • @mpjdem running your code gives me an issue. Not an error right away, but if I try to run `View(df_scaled)` afterwards I get an error: `dims [product 5] do not match the length of object [15]`. If I add `as.vector` before each of the `scale` calls, it fixes the problem (I thought to try because `scale` outputs a matrix rather than a vector). Not sure if this is universal, I'm on `R-3.3.2`, `dplyr-0.5.0`. – rosscova Jan 20 '17 at 11:40
  • I can confirm (by `sapply(df_scaled,class)`) that the columns are matrices, which I think is what `View` is struggling with (I thought it would be OK, since list columns usually work fine). Coercing each to vector, either in the `mutate` call or afterwards resolves it. – rosscova Jan 20 '17 at 11:46
  • Your problem might be related to this, which seems to be a bug: http://stackoverflow.com/questions/35775696/trying-to-use-dplyr-to-group-by-and-apply-scale – mpjdem Jan 20 '17 at 11:48
  • I was convinced this problem must be in `scale` rather than `dplyr`, but tried installing the dev version of `dplyr` anyway, and you're right; that fixed it. Thanks for clearing it up for me @mpjdem – rosscova Jan 20 '17 at 12:33
  • I edited it into the answer, for future googlers. Feel free to accept the answer if it was helpful :) – mpjdem Jan 20 '17 at 15:30