2

I'm struggling with using mapply on functions I construct where I have one or more arguments that are needed because I am programming in a bigger environment, for example if I write a function where one of the arguments are data.

fun_test <- function(data,col,val1,val2){return(data[col][1,] * val1-val2)}

So data and col can for example be constant, but I want to vary the output of my function depending on val1 and val2:

> mapply(FUN=fun_test,mtcars,"cyl",mtcars$cyl,mtcars$cyl*2)
Error in data[col][1, ] : incorrect number of dimensions

I'm trying to understand how mapply works; I surely cannot pass mtcars, and "cyl" as a vector, can I?

EDIT: I have an environment in which the data may vary, e.g. sometimes I use mtcars, sometimes it is another dataset. So I cannot hardcode the data into the function

EDIT2: 1) I have data some dataset, 2) I have different Excel-files that I read into R, 3) I make a lookup function that extracts information from these Excel-files in R, 4) for one or two variables (from the dataset) at the time I go into the lookup-functions I created and extract information.

So these lookup functions depend on both the data (the variables I need to lookup) and the Excel-files that I use to do the looking up.

Helen
  • 533
  • 12
  • 37
  • @akrun: are you sure this is a duplicate? I'm struggling with seeing how this helps me. – Helen Jun 04 '19 at 13:24
  • `Map` is a wrapper for `mapply` I guess the issue was the one of the arguments is a constant. So you need to wrap it inside a list, which is the solution provided in the dupe link. If you don't agree with that, I can reopen – akrun Jun 04 '19 at 13:25
  • @akrun: i see, maybe I was too vague when I wrote "constant". I mean that the argument does not change, but the constant argument is not a constant per se, but it's my data. – Helen Jun 04 '19 at 13:29
  • 1
    I meant the data or input is repeated for each of the list elements. That is the reason you have answer with `mapply(FUN=fun_test, list(mtcars),"cyl",mtcars$cyl,mtcars$cyl*2)` where the data is wrapped in a `list` – akrun Jun 04 '19 at 13:31
  • @akrun: thanks for elaborating! Tbh, I didn't understand that they were related, I read that answer before I made my Q, but if you want it closed, I understand – Helen Jun 04 '19 at 13:48
  • I think you have a feeling that it is not dupe. Okay, I am reopening – akrun Jun 04 '19 at 13:50

2 Answers2

4

mapply is a multidimensional lapply. This means that instead of iterating over just one object (i.e. the columns of a data.frame or the elements of a vector), it iterates over multiple ones at the same time. The only condition is that the length of those objects needs to be the same, i.e. the columns of a data.frame and the lengths of the vectors. So, you cannot pass constants (unless you pass in a vector of the same constants to match the length, but why would you do that).

Try an easy example (sums the same indexes of the vectors):

mapply(sum, 1:10, 11:20)

So, in your case, just pass in the constants straight into the function:

fun_test <- function(val1, val2){return(mtcars['cyl'] * val1 - val2)}

mapply(FUN=fun_test, mtcars$cyl, mtcars$cyl*2)

Update:

Then I think what you need is to include mapply within your function. In that way you can add any argument you like (both constants and variable). It would look like this:

myfunc <- function(data, col, val1, val2) {

  fun_test <- function(val1, val2) {
    data[col] * val1 - val2 
  }

  mapply(FUN=fun_test, val1, val2)

}

myfunc(mtcars, 'cyl', mtcars$cyl, mtcars$cyl*2)
LyzandeR
  • 37,047
  • 12
  • 77
  • 87
  • 1
    I'm sorry, maybe I wasn't clear enough, but I have an environment in which the data may vary, e.g. sometimes I use mtcars, sometimes it is another dataset. So I cannot hardcode the data into the function. – Helen Jun 04 '19 at 12:49
2

If you want to pass dataframe as constant value pass it as list so that it is recycled completely otherwise it will pass each column separately in mapply

fun_test <- function(data,col,val1,val2){return(data[1, col] * val1-val2)}

mapply(FUN=fun_test, list(mtcars),"cyl",mtcars$cyl,mtcars$cyl*2)
#[1] 24 24 16 24 32 24 32 16 16 24 24 ......

So the first value 24 in the output can be reproduced by

mtcars[1, "cyl"] * mtcars$cyl[1] - mtcars$cyl[1]*2
#[1] 24

I know this is an example and actual implementation is different but you can get the same output directly by doing

mtcars[1, "cyl"] * mtcars$cyl - mtcars$cyl*2

To understand the difference between both the calls we can debug the function add browser() in the function

fun_test <- function(data,col,val1,val2){
   browser()
   return(data[1, col] * val1-val2)
}

Now, call the function and check the parameter in the function

mapply(FUN=fun_test, mtcars,"cyl",mtcars$cyl,mtcars$cyl*2)
Browse[1]> data
# [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 
#     10.4 10.4 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3 19.2 27.3 26.0 30.4 
#     15.8 19.7 15.0 21.4

this is first column in mtcars which is mpg (Check mtcars$mpg).

It is a numeric vector and now you are trying to subset mpg column and index 1 from it which gives you the same error

mtcars$mpg["cyl"][1, ]

Error in mtcars$mpg["cyl"][1, ] : incorrect number of dimensions

Now in 2nd case when we pass dataframe as list, check data

 mapply(FUN=fun_test, list(mtcars),"cyl",mtcars$cyl,mtcars$cyl*2)

Browse[1]> data
#                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
#Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
#Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
#Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
#Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
#Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
#Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
#....

It is complete dataframe and then you can subset from this

>data[1, "cyl"]
#[1] 6

PS - I don't know the context on why this being done and I believe there would be better ways to handle it.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Could you elaborate what you mean by the list being recycled completely? As for context, as I mentioned in the edit in my OP: 1) I have data some dataset, 2) I have different Excel-files that I read into R, 3) I make a lookup function that extracts information from these Excel-files in R, 4) for one or two variables (from the dataset) at the time I go into the lookup-functions I created and extract information. – Helen Jun 04 '19 at 13:06
  • @Erosennin I have added some explanation. – Ronak Shah Jun 04 '19 at 13:22
  • I think this should be added to R documentation, as the presence of a `MoreArgs` parameter is a source of confusion. – Dominic Comtois Jan 23 '21 at 01:12