-3

Background

This is an attempt to improve a previous question. The idea is to create a function where I pass a dataframe and optionally a vector with variable names, the function then iterate over the variables in the dataframe, if they are numeric they are transformed. If the vector of names is also passed, only the ones in the list are iterated.

Tools used

  • In order to create an "optional" argument I used the missing() function. Source.

  • The syntax to iterate over the vector was inspired from this dicussion here.

Code & where I am stuck:

transformDivideThousand <- function(data_frame, listofvars){
    if (missing(listofvars)) {
        data_frame[, sapply(data_frame, is.numeric)] =
        data_frame[, sapply(data_frame, is.numeric)]/1000
    } else {
        for (i in names(data_frame)) {
            for (i in listofvars) {
                data_frame[[i]]<-data_frame[[i]]/1000
            }
        }
    }
    return(data_frame)
}

The call would look like:

test <- transformDivideThousand(cases, c("col2", "col3", "col15"))

Question

  • What I am getting wrong on that code? I managed to make the optional argument work, but there is something wrong in the code. When I test it the variables from the list are converted to zeros.

Cautionary suggestion

  • If you are down-voting the question, at very least justify why!
Community
  • 1
  • 1
lf_araujo
  • 1,991
  • 2
  • 16
  • 39
  • Maybe `if (i %in% listofvariables) {` ? See [set](https://stat.ethz.ch/R-manual/R-devel/library/base/html/sets.html) for more info. – zx8754 Apr 28 '16 at 06:17

2 Answers2

1

You may do:

# data
 head(iris)
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

the function

foo_divide <- function(x, y){
  foo <- function(x) if(is.numeric(x)) x/1000 else x # function to divide numeric columns by 1000
  if(missing(y)) y <- 1:ncol(x) # set y if missing
  x[, y] <-  lapply(x[, y], foo)
  as.data.frame(x) # return
}

no listofvars

head(foo_divide(iris))
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1       0.0051      0.0035       0.0014       2e-04  setosa
2       0.0049      0.0030       0.0014       2e-04  setosa
3       0.0047      0.0032       0.0013       2e-04  setosa
4       0.0046      0.0031       0.0015       2e-04  setosa
5       0.0050      0.0036       0.0014       2e-04  setosa
6       0.0054      0.0039       0.0017       4e-04  setosa

plus listofvars

 head(foo_divide(iris, c("Petal.Length", "Petal.Width", "Species")))
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5       0.0014       2e-04  setosa
2          4.9         3.0       0.0014       2e-04  setosa
3          4.7         3.2       0.0013       2e-04  setosa
4          4.6         3.1       0.0015       2e-04  setosa
5          5.0         3.6       0.0014       2e-04  setosa
6          5.4         3.9       0.0017       4e-04  setosa

You can also use a numeric vector to specify the columns

foo_divide(iris, 1:3)
Roman
  • 17,008
  • 3
  • 36
  • 49
0

Here is a solution, with the help from the comments:

################################################################################
#
# transformDivideThousand(dataframe, optional = vectorListOfVariables)
#
# Definition: This function applies a transformation, dividing variables by
# 1000. If the vector is passed it applies the transformation to all variables
# in the dataframe.
#
# Example: df <- transformDivideThousand (cases, c("label1","label2"))
#
################################################################################
transformDivideThousand <- function(data_frame, listofvars){
    if (missing(listofvars)) {
        data_frame[, sapply(data_frame, is.numeric)] =
        data_frame[, sapply(data_frame, is.numeric)]/1000
    } else {
        for (i in names(data_frame)) {
            if (i %in% listofvars) {
                data_frame[,i] = data_frame[,i]/1000
            }
        }
    }
    return(data_frame)
}
lf_araujo
  • 1,991
  • 2
  • 16
  • 39