3

The General Problem

I want to vary the additional arguments passed on to a function in a lapply/sapply (or maybe mapply?) call. It would be nice to know how to do this in general. If it matters, though, for my specific purpose, I am trying to incorporate this into a custom function. (So hopefully it can scale).

Specific Example of Problem

Assume I have the following data frame:

df <- data.frame(column1 = letters[1:4], 
             column2 = LETTERS[1:4], 
             column3 = 1:4, 
             stringsAsFactors = FALSE)

As an example, I would like to convert column1 and column2 to factors, each with different levels. I might note the columns and levels as such:

# Columns in df I want to apply the factor() function to.

     cols <- c("column1", "column2")

# Desired levels for column1

     column1_lvl <- c(letters[1:5])

# Desired levels for column2

     column2_lvl <- c(LETTERS[1:6])

Note that I have specified two separate levels for the columns, each with more levels than exist in df. This serves as a motivation for varying the arguments. Now I test out a lapply call without varying the levels argument to factor:

     df[cols] <- lapply(df[,cols], factor)

This works and successfully converts those columns to factors. I redefine df to it's original structure for the next step. Now I want to specify the levels for each of the columns. In ?lapply, it says that you can pass additional arguments to FUN, but it doesn't specify how to vary those arguments over each vector in X. Trying this with one instance, I can write this:

     df["column1"]<- factor(df[,"column1"], levels = column1_lvl)

This works. But now I want to abstract the levels argument. Unfortunately, this doesn't work, because no matter what you assign to levels, R will attempt to use that argument to each of the vectors in X.

Ideally, something like the following would work. The following is FAKE CODE that I wish would work the way I want it, but doesn't:

     df[cols] <- lapply(df[,cols], factor, level = list(column1_lvl, column2_lvl))

What I have tried

I have not been able to find many resources that explain how I might be able to accomplish this. Or perhaps, I don't see what needs to be done. This post helped me a little, but I'm wondering if there is a way around creating my own factor function, for example.

Additionally, this person's answer to their own question encouraged me to check out mapply. Though I've read ?mapply's documentation, and followed along with some tutorials, I haven't been able to figure it out. On that front, I have tried the following code, which doesn't work (for my purposes):

     col_levels <- list(column1_lvl, column2_lvl)
     df[cols] <- mapply(factor, df[,cols], MoreArgs = col_levels)

SessionInfo

> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_3.5.1 tools_3.5.1    yaml_2.1.19  

Final Thoughts

I could just be having a difficult time knowing what to search for. I am always open to figuring out the problem myself, if you are able to point me in the right direction. Any additional resources are more than welcome.

Thanks, in advance!

1 Answers1

3

We can use Map to change the column levels with the corresponding 'lvl' objects in a list

df[cols] <- Map(function(x, y) factor(x, levels = y),
             df[cols], list(column1_lvl, column2_lvl))

and check the levels of the columns

lapply(df[cols], levels)
#$column1
#[1] "a" "b" "c" "d" "e"

#$column2
#[1] "A" "B" "C" "D" "E" "F"

As the OP mentioned a way to solve this with lapply, one option with lapply is to loop through sequence and then subset the data and the corresponding 'lvls' list

lvls_lst <- list(column1_lvl, column2_lvl)
df[cols] <- lapply(seq_along(lvls_lst), function(i) 
         factor(df[cols][[i]], levels = lvls_lst[[i]]))

NOTE: In both the cases, we need to explicitly specify the levels

akrun
  • 874,273
  • 37
  • 540
  • 662
  • Could you explain a little bit more about how the `Map` solution works? I am reading the documentation, but I am having a difficult time getting an intuitive grasp. It "Maps" an anonymous function (in your example) to the first argument of that function, then uses any remaining arguments for the anonymous functions additional arguments? I like how your `lapply` solution doesn't force you to create a function with all the arguments of an existing function (just in case you wanted to vary many more arguments from an existing function), but `Map` seems "cleaner". – Christian Million Sep 24 '18 at 22:11
  • 3
    @MillionC - You could even collapse it up further and name the inputs to `factor` with `Map` - `df[cols] <- Map(factor, x=df[cols], levels=list(column1_lvl, column2_lvl))` . You don't have to create an anonymous function. – thelatemail Sep 24 '18 at 22:17
  • Thanks, @thelatemail. I just need to spend some time developing a stronger understanding of functionals. The code works, and my problem is solved, but I still don't understand how `Map` works. It is interesting that the vectors on which you are mapping the function are also the functions arguments. It's a new way of thinking about it for me. Anyway, thanks for your help! – Christian Million Sep 24 '18 at 22:28
  • 3
    @MillionC - all these functions are literally just loops. It is saying run `factor()` with the first part of the supplied `x=` argument + the first part of the supplied `levels=` argument. Then repeat for the 2nd, the 3rd, the nth part of each argument. A simple example that might help to understand is `Map(paste, 1:3, c("a","b","c"))` - which `paste`s together `1/a`, then `2/b`, then `3/c`. Good luck with it! – thelatemail Sep 24 '18 at 22:43
  • 1
    @MillionC I used anonymous function to make it more obvious. You can specify the `levels` as thelatemail showed. – akrun Sep 25 '18 at 02:59