1

In base R, sapply has a safer (and sometimes faster) variant called vapply. mapply is a multivariate version of sapply.

I am running into an edge case issue when using mapply (length-0 input to mapply (not to FUN) yields a list() instead of integer(0) ).

Is there an vapply equivalent of mapply that allows to specify FUN.VALUE (the expected return value type/dimension)?

If not, what is the the recommended pattern in those situations?

A toy example:

size_of_union <- function(A, B) length(union(A, B))
# normal case:
x <- list(1:3, 2, 3)
y <- list(3, 2, numeric(0)) 
mapply(size_of_union, x, y)
#> [1] 3 1 1

# edge-case:
x <- integer(0)
y <- integer(0)
mapply(size_of_union, x, y)
#> list()  # integer(0) would be desired here

A more contrived toy example:

range_of_intersect <- function(A, B) range(intersect(A, B))

x <- list(1:3, 2, 3)
y <- list(3, 2, numeric(0)) 
mapply(range_of_intersect, x, y)
#> Warning in min(x): no non-missing arguments to min; returning Inf
#> Warning in max(x): no non-missing arguments to max; returning -Inf
#>      [,1] [,2] [,3]
#> [1,]    3    2  Inf
#> [2,]    3    2 -Inf


x <- numeric(0)
y <- numeric(0)
mapply(range_of_intersect, x, y)
#> list() # structure(numeric(0), .Dim = c(2L, 0L)) would be desired
jan-glx
  • 7,611
  • 2
  • 43
  • 63
  • You cannot have a `vapply` that will return `integer(0)` and `integer(1)` etc. in this case you will have to rewrite your FUN so as to ensure you do not return elements of length 0. Note that if you are returning an empty list, thats because your function does return NULL – Onyambu Jul 03 '23 at 19:40
  • Just to point out, there is no multivariate equivalent for `vapply`. but you could try `purrr:::pmap_int`. Note that if you are looking for non-existent functions, you are doing something incorectly – Onyambu Jul 03 '23 at 19:43
  • You also missed my point. You definitely cannot use `X` like this. if `X` is just `character(0)` no need of `vapply`. Just `as.integer(X)` should do. Now try having `X = list(character(0), character(1))` and run your code. It wont work. *The return value for length-0 inputs to `FUN` (`add_scalar`) DOES matter*. You specified that it should be `integer(1)` for all the cases of `X` therefore you cannot mix the return values to have some with `integer(0)` and others with `integer(1)`. Try any example with `X` being a list. – Onyambu Jul 04 '23 at 12:17
  • @Onyambu I think it wasn't clear from my question that this was about length-0 input *to `mapply`* not *to `FUN`*. I think it is clear now and will remove my earlier comments with which I was trying to clarify. – jan-glx Jul 04 '23 at 16:46
  • 1
    What about `as.integer(Map(size_of_union, x, y))` ? – moodymudskipper Jul 04 '23 at 16:48
  • @moodymudskipper that works for my use-case of a function returning a scalar (minus the probably minor speed benefits and build-in sanity check of `vapply`), I added a more contrived example now to make my case. – jan-glx Jul 04 '23 at 17:04
  • you might iterate on the index, and then you can use `vapply()`, would this work ? `vapply(seq_along(x), function(i) range_of_intersect(x[[i]], y[[i]]), numeric(2))` – moodymudskipper Jul 04 '23 at 17:16
  • I went along and answered – moodymudskipper Jul 04 '23 at 17:23
  • Nice, @moodymudskipper. *Now*, I think, a combination of your comments would make an accepted answer ;) perhaps with a quote about the issue from `?Map`? (let me know if you need another hint) – jan-glx Jul 04 '23 at 19:14

3 Answers3

2

In base R, there is no version of mapply() I know of where you could enforce the output value type. You can look into using the pmap_*() functions from the package purrr.

E.g., for your example:

size_of_union <- function(A, B) length(union(A, B))

x <- list(1:3, 2, 3)
y <- list(3, 2, numeric(0)) 
purrr::pmap_int(list(x, y), size_of_union)
#> [1] 3 1 1

x <- integer(0)
y <- integer(0)
purrr::pmap_int(list(x, y), size_of_union)
#> integer(0) # purrr::pmap_int handles edge-case correctly

A different way of looking at it

Your edge-case is not really about data types, it is about zero-length inputs. vapply() probably just creates an empty vector on the basis of FUN.VALUE in order to put the results inside it, but then it does not iterate at all (the input is of length zero) and so it remains empty.

mapply() works differently, creating a list first and then coercing into an atomic vector/matrix if SIMPLIFY = TRUE (the default). So the placeholder is an empty list(), rather than an empty atomic vector.

This is also why the stopifnot() does not throw an error on zero-length input - it is never called in the first place because no iterations happened.

I would just do as.integer(mapply(..., SIMPLIFY = TRUE)) which converts list() if it occurs to integer(0).

So: If the question is "How to solve this edge case?" then this is it. If the question is "How to make base R behave like purrr?" (ensuring all resulting elements are of correct type) then I don't think there is a generally accepted pattern.

jakub
  • 4,774
  • 4
  • 29
  • 46
  • Thanks, that works well but relies on external packages. – jan-glx Jul 04 '23 at 12:04
  • That's right. Would you like to add to the question the "recommended pattern" has to be using base R? – jakub Jul 04 '23 at 16:14
  • (I) I added another example but cant figure out how to make it work with `purr`, using `purrr::pmap_vec` with `.ptype` ?? – jan-glx Jul 04 '23 at 19:53
  • (II) `as.integer(mapply(..., SIMPLIFY = TRUE))` will also *not* error if `FUN` always yields a vector of the same size (and `mapply` thus a matrix which `as.integer` silently convert to a vector, so `as.integer(mapply(..., SIMPLIFY = FALSE))`, or, equivalently `as.integer(Map(...))` would be slightly safer. – jan-glx Jul 04 '23 at 19:56
  • (III) re: "add to question" haha not a requirement but I doubt that making a package and its dependencies a dependency for such a simple function can be the "recommended pattern" (if one anyway depends on any tidyverse packages, sure!) – jan-glx Jul 04 '23 at 19:57
  • (I) sorry, don't know either. (II) I guess that's right, although weirdness can still happen, hard to say how this would behave in a non-toy code. (III) Not sure how many people would stumble upon this thread and think, "How trivial, definitely not worth a dependency!" – jakub Jul 05 '23 at 07:31
  • One more observation: While Mudskipper's solution is clever, it still bothers me a little. If the problem is input length, should it really be the iterator function's job to try to account for it? – jakub Jul 05 '23 at 07:35
2

For your first case you might use as.integer(Map(size_of_union, x, y))

More generally you can still use vapply() but you'll need to loop on the index rather than on parallel vectors :

size_of_union <- function(A, B) length(union(A, B))
x <- list(1:3, 2, 3)
y <- list(3, 2, numeric(0)) 
vapply(seq_along(x), function(i) size_of_union(x[[i]], y[[i]]), integer(1))
#> [1] 3 1 1

x <- integer(0)
y <- integer(0)
vapply(seq_along(x), function(i) size_of_union(x[[i]], y[[i]]), integer(1))
#> integer(0)

range_of_intersect <- function(A, B) range(intersect(A, B))
x <- list(1:3, 2, 3)
y <- list(3, 2, numeric(0)) 
res <- vapply(seq_along(x), function(i) range_of_intersect(x[[i]], y[[i]]), numeric(2))
#> Warning in min(x): no non-missing arguments to min; returning Inf
#> Warning in max(x): no non-missing arguments to max; returning -Inf
res
#>      [,1] [,2] [,3]
#> [1,]    3    2  Inf
#> [2,]    3    2 -Inf
dput(res)
#> structure(c(3, 3, 2, 2, Inf, -Inf), dim = 2:3)

x <- numeric(0)
y <- numeric(0)
res <- vapply(seq_along(x), function(i) range_of_intersect(x[[i]], y[[i]]), numeric(2))
res
#>     
#> [1,]
#> [2,]
dput(res)
#> structure(numeric(0), dim = c(2L, 0L))

Created on 2023-07-04 with reprex v2.0.2

moodymudskipper
  • 46,417
  • 11
  • 121
  • 167
  • I accepted your answer since it was the "most helpful in finding my solution". See [my answer below](https://stackoverflow.com/a/76690733/1870254) for a slightly better approach. – jan-glx Jul 14 '23 at 19:49
  • This is clever, feel free to accept your own answer if you feel it answers the question better. I'm happy that I could help – moodymudskipper Jul 15 '23 at 00:42
1

I find a combination of vapply and mapply/Map easier to use than vapply over indices.

Here, mapply (with SIMPLIFY=FALSE)/Map maps the inputs to a list of return values (turning the multivariate into univariate problem) while vapply (with FUN = identity) only takes care of checking of / providing return value types and appropriately simplifying the output.

Use either directly with either of:

vapply(mapply(my_fun, my_params1, my_params2, SIMPLIFY = FALSE), FUN = identity, FUN.VALUE = my_restype))
vapply(Map(my_fun, my_params1, my_params2), FUN = identity, FUN.VALUE = my_restype))

Or using either of the following shorthands:

vMap <- function(FUN, FUN.VALUE, ...)
  vapply(Map(FUN, ...), FUN = identity, FUN.VALUE = FUN.VALUE)
vmapply <- function(FUN, FUN.VALUE, ..., MoreArgs = NULL)
  vapply(mapply(FUN = FUN, ..., MoreArgs = MoreArgs, SIMPLIFY = FALSE), FUN = identity, FUN.VALUE = FUN.VALUE)

Simple example:

size_of_union <- function(A, B) length(union(A, B))
## normal case:
x <- list(1:3, 2, 3)
y <- list(3, 2, numeric(0)) 
vapply(Map(size_of_union, x , y), FUN = identity, FUN.VALUE = integer(1))
#> [1] 3 1 1

## edge-case:
x <- integer(0)
y <- integer(0)
vapply(Map(size_of_union, x , y), FUN = identity, FUN.VALUE = integer(1))
#> integer(0)

More contrived example:

range_of_intersect <- function(A, B) range(intersect(A, B))

## normal case:
x <- list(1:3, 2, 3)
y <- list(3, 2, numeric(0)) 
vapply(Map(range_of_intersect, x , y), FUN = identity, FUN.VALUE = numeric(2))
#> Warning in min(x): no non-missing arguments to min; returning Inf
#> Warning in max(x): no non-missing arguments to max; returning -Inf
#>      [,1] [,2] [,3]
#> [1,]    3    2  Inf
#> [2,]    3    2 -Inf

## edge-case:
x <- numeric(0)
y <- numeric(0)
vapply(Map(range_of_intersect, x , y), FUN = identity, FUN.VALUE = numeric(2))
#>     
#> [1,]
#> [2,]

from ?Map (emphasis mine):

Map is a simple wrapper to mapply which does not attempt to simplify the result, similar to Common Lisp's mapcar (with arguments being recycled, however). Future versions may allow some control of the result type.

This hints that a multivariate version of vapply is not yet implemented in base R. And given the rare need for it and the minimal extra effort incurred by using the approach presented in this answer, it probably never will be.

Thanks to @moodymudskipper for leading me to this solution!

jan-glx
  • 7,611
  • 2
  • 43
  • 63