The General Problem
I want to vary the additional arguments passed on to a function in a lapply/sapply (or maybe mapply?) call. It would be nice to know how to do this in general. If it matters, though, for my specific purpose, I am trying to incorporate this into a custom function. (So hopefully it can scale).
Specific Example of Problem
Assume I have the following data frame:
df <- data.frame(column1 = letters[1:4],
column2 = LETTERS[1:4],
column3 = 1:4,
stringsAsFactors = FALSE)
As an example, I would like to convert column1 and column2 to factors, each with different levels. I might note the columns and levels as such:
# Columns in df I want to apply the factor() function to.
cols <- c("column1", "column2")
# Desired levels for column1
column1_lvl <- c(letters[1:5])
# Desired levels for column2
column2_lvl <- c(LETTERS[1:6])
Note that I have specified two separate levels for the columns, each with more levels than exist in df
. This serves as a motivation for varying the arguments. Now I test out a lapply
call without varying the levels argument to factor:
df[cols] <- lapply(df[,cols], factor)
This works and successfully converts those columns to factors. I redefine df
to it's original structure for the next step. Now I want to specify the levels for each of the columns. In ?lapply
, it says that you can pass additional arguments to FUN
, but it doesn't specify how to vary those arguments over each vector in X
. Trying this with one instance, I can write this:
df["column1"]<- factor(df[,"column1"], levels = column1_lvl)
This works. But now I want to abstract the levels
argument. Unfortunately, this doesn't work, because no matter what you assign to levels
, R will attempt to use that argument to each of the vectors in X
.
Ideally, something like the following would work. The following is FAKE CODE that I wish would work the way I want it, but doesn't:
df[cols] <- lapply(df[,cols], factor, level = list(column1_lvl, column2_lvl))
What I have tried
I have not been able to find many resources that explain how I might be able to accomplish this. Or perhaps, I don't see what needs to be done. This post helped me a little, but I'm wondering if there is a way around creating my own factor
function, for example.
Additionally, this person's answer to their own question encouraged me to check out mapply
. Though I've read ?mapply
's documentation, and followed along with some tutorials, I haven't been able to figure it out. On that front, I have tried the following code, which doesn't work (for my purposes):
col_levels <- list(column1_lvl, column2_lvl)
df[cols] <- mapply(factor, df[,cols], MoreArgs = col_levels)
SessionInfo
> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
Matrix products: default
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_3.5.1 tools_3.5.1 yaml_2.1.19
Final Thoughts
I could just be having a difficult time knowing what to search for. I am always open to figuring out the problem myself, if you are able to point me in the right direction. Any additional resources are more than welcome.
Thanks, in advance!