1

I'm going through the examples of map() from 'R For Data Science'.

One example is:

library(dplyr)
library(purrr)
df <- tibble(
  a = rnorm(10),
  b = rnorm(10),
  c = rnorm(10),
  d = rnorm(10)
)

df
#> # A tibble: 10 x 4
#>         a      b       c       d
#>     <dbl>  <dbl>   <dbl>   <dbl>
#>  1 -0.570  1.48   2.37    1.60  
#>  2  0.122  2.08   0.222   0.0338
#>  3 -0.890  0.429 -1.75   -1.48  
#>  4  0.334  0.854  0.849  -0.525 
#>  5  1.22  -0.378 -1.00   -0.147 
#>  6 -1.04  -0.427 -1.18    0.907 
#>  7 -0.392  0.102  0.0951  0.842 
#>  8  0.893  0.932  0.620  -0.911 
#>  9  1.00   0.616 -0.937  -0.0286
#> 10  0.190  1.12  -1.02    1.45

In the map_dbl() below, I don't need to add a tilde before the function map_dbl(~ mean) and I don't have to put .

df %>% map_dbl(mean)

#>           a           b           c           d 
#>  0.08714704  0.68069227 -0.17382734  0.17470388

Whereas, in the example below, I do have to put the ~ before the .f and I also have to specify data = .

models <- mtcars %>% 
  split(.$cyl) %>% 
  map(~ lm(mpg ~ wt, data = .))
models

I've tried reading previous answers, eg What is meaning of first tilde in purrr::map, but I'm still unsure as to the exact difference of when I need to use the tilde and .

Would perhaps the easiest way be for me to just always include those two things, even if they aren't strictly necessary?

Jeremy K.
  • 1,710
  • 14
  • 35

2 Answers2

2

I am not an expert on map, but here is why I think in this case you have to use the tilde. I believe it has to do with apply a formula versus a function.

Here is an example where you would not have to. In this case I am taking a list of the cylinders and sending it to a function:

cyl = mtcars%>%
  select(cyl)%>%
  unique%>%
  unlist()

model = function(CYL){

  lm(mpg ~ wt, data = mtcars%>%
       filter(cyl == !!CYL))
}

cyl%>%
  map(model)

In your example you are applying a formula not a function. Here is another example of having to use a tilde:

models <- mtcars %>% 
  split(.$cyl) %>% 
  map(~.$mpg+.$cyl)
models

In help map is defined as: The map functions transform their input by applying a function to each element and returning a vector the same length as the input. I believe the tilde is changing your formula into a function.

Bryan Adams
  • 174
  • 1
  • 12
2

The quick answer to your question is, it is never necessary to use the tilde notation when calling map. There are different ways of calling map and the tilde notation is one of them. You already described the simpelst way of calling map, when a function only takes/needs one argument.

df %>% map_dbl(mean)

However, when functions get more complex there are basically two ways to call them either with the tilde notation or with a normal anonymous function.

# normal anonymous function
models <- mtcars %>% 
  split(.$cyl) %>% 
  map(function(x) lm(mpg ~ wt, data = x))

# anonymous mapper function (~)
models <- mtcars %>% 
  split(.$cyl) %>% 
  map(~ lm(mpg ~ wt, data = .))

The tilde notation is basically turning a formula into a function, which is most times easier to read. Each option can be turned into a named function, which works as follows. Ideally, the named function reduces the complexity of the underlying function to one argument (the one which should be looped over) and in this case the function can be called like all simple functions in map without further arguments/notations.

# normal named function notation 
lm_mpg_wt <- function(x) {
  lm(mpg ~ wt, data = x)
}

models <- mtcars %>% 
  split(.$cyl) %>% 
  map(lm_mpg_wt)


# named mapper function
mapper_lm_mpg_wt <- as_mapper(~ lm(mpg ~ wt, data = .))

models <- mtcars %>% 
  split(.$cyl) %>% 
  map(mapper_lm_mpg_wt)

Basically these are your options. You should choose whatever is easiest and most fit to your problem. Named functions are best, if you need them again. Many think that mapper functions are easier to read, but at the end of the day that is a choice of personal preference.

TimTeaFan
  • 17,549
  • 4
  • 18
  • 39