1

I would like to write a function which takes a list of variables out of a dataframe, say:

df <- data.frame(a = c(1,2,3,4,5), b = c(6,7,8,9,10))

And to compute always the same calculation, say calculate the standard deviation like:

test.function <- function(var){ 
  for (i in var) {
  paste0(i, "_per_sd") <-  i / sd(i)
  }
  }

In order to create a new variable a_per_sd which is divided by it's standard deviation. Unfortunately, I am stuck and get a Error in paste0(i, "_per_sd") <- i/sd(i) : could not find function "paste0<-" error.

The expected usage should be:

test.function(df$a, df$b)

The expected result should be:

> df$a_per_sd
[1] 0.6324555 1.2649111 1.8973666 2.5298221 3.1622777

And for every other variable which was given. Somehow I think I should use as.formula and/or eval, but I might be doing a thinking error. Thank you very much for your attention and help.

Medomatto
  • 57
  • 1
  • 6
  • Where do you get the `sd`s? Getting `sd` of the same value doesn't seem intuitive to me. – NelsonGon Jul 23 '20 at 09:20
  • 1
    The standard deviation should be calculated on the entire column (i in this case in the loop). In this case the sd of the entire df$a is 1.581139, and I would like every single row of the variable to be divided by sd. – Medomatto Jul 23 '20 at 10:16

4 Answers4

3

Is this what you are after?

df <- data.frame(a = c(1,2,3,4,5), b = c(6,7,8,9,10))

test.function <- function(...){
    x <- list(...)
    xn <- paste0(unlist(eval(substitute(alist(...)))),
                 "_per_sd")
    setNames(lapply(x, function(y) y/sd(y)), xn)
}

cbind(df, test.function(df$a, df$b))
#>   a  b df$a_per_sd df$b_per_sd
#> 1 1  6   0.6324555    3.794733
#> 2 2  7   1.2649111    4.427189
#> 3 3  8   1.8973666    5.059644
#> 4 4  9   2.5298221    5.692100
#> 5 5 10   3.1622777    6.324555

Created on 2020-07-23 by the reprex package (v0.3.0)

user12728748
  • 8,106
  • 2
  • 9
  • 14
  • Thank you very much, that did the trick!! Very elegant to use (...) for many parameters in functions... did not know this. Thanks again – Medomatto Jul 23 '20 at 17:21
1

The question is not completely clear to me, but you might get sd of rows/columns or vectors by these approaches:

apply(as.matrix(df), MARGIN = 1, FUN = sd) #across rows
#[1] 3.535534 3.535534 3.535534 3.535534 3.535534

apply(as.matrix(df), MARGIN = 2, FUN = sd) #across columns
#       a        b 
#1.581139 1.581139 

lapply(df, sd) #if you provide list of vectors (columns of `df` in this case)
#$a
#[1] 1.581139
#
#$b
#[1] 1.581139
matushiq
  • 66
  • 4
  • Thank you very much for your help. However, with this approach I cannot pass a specific variable of a dataframe and I would like that specific variable to be renamed and divided by the standard deviation of the entire column. – Medomatto Jul 23 '20 at 10:15
1

I got this far. Is this what you are looking for?

test.function <- function(var)
  
{
 newvar = paste(var, "_per_sd")
 assign(newvar, var/sd(var))
 get(newvar)
 
 }

Input:

test.function(df$a)

Result:

[1] 0.6324555 1.2649111 1.8973666 2.5298221 3.1622777

I got the idea from here: Assignment using get() and paste()

writer_typer
  • 708
  • 7
  • 25
  • 1
    Dear Type Writer, thank you very much. This is very close... I still need two features in it: 1. A loop to enter more than one variables like test.function(df$a, df$b...) and 2. the possibility to include the new variable in the existing dataframe df. If we could achive this too, it would be great! Thanks again. – Medomatto Jul 23 '20 at 11:49
  • 1
    I'm sure there is a way, but I'm still learning. This is a helpful link : https://stackoverflow.com/questions/48694626/creating-a-function-in-r-with-variable-number-of-arguments – writer_typer Jul 23 '20 at 12:15
1

At the end this is what my code looks like:

    test.function <- function(...){
    x <- list(...)
    xn <- paste0(unlist(eval(substitute(alist(...)))),
                 "_per_sd")
    setNames(lapply(x, function(y) y/sd(y, na.rm = TRUE)), xn)
    }
test.function.wrap  <- function(..., dataframe) {
  assign(deparse(substitute(dataframe)),   cbind(dataframe, test.function(...)) , envir=.GlobalEnv) 
}


test.function.wrap(df$a, df$b , dataframe = df)

To be able to assign the new variables to the existing dataframe, I put the (absolutely genius) tips together and wrapped the function in another function to do the trick. I am aware it might not be as elegant, but it does the work!

Dharman
  • 30,962
  • 25
  • 85
  • 135
Medomatto
  • 57
  • 1
  • 6