3

"lapply() is inferior to map() because lapply() uses for() loops" - I have heard this many times from colleagues lately and it doesn't ring true based on my knowledge of how for(), apply, and purrr work. Also this seems at odds with Hadley's Advanced R: Functionals chapter where he says that 'map() and lapply() are equivalent and if you are only using map() then you might as well be using lapply().' Also in one of his presentations he says 'all of these functions use for() loops somewhere deep inside' when referring to purrr functionals. That leaves me with a number of questions.

My understanding is that the primary advantage of purrr is readability, consistency among functions, 'helper functions', and more ways to iterate than offered by apply. Those are not trivial, but one could argue their advantages are somewhat subjective (if you don't need some of that functionality...).

My questions: Is purrr::map using a for loop? I don't see it in the R source which means it must be in the C? If so, how does that differ from the R for() loop?

Are there any performance advantages (speed of code execution or memory allocation) between purrr and their apply analogues (I am thinking specifically purrr::map() vs lapply(), but am interested in any comparison)?

If I test the following overly simple case, lapply is executing faster than map. If there are any performance advantages to map, under what conditions do those kick in?:

library(purrr)
microbenchmark::microbenchmark(
"FN1"={FN1<-map(mtcars, sd)},
"FN2"={FN2<-lapply(mtcars, sd)},
times=100)
TBP
  • 697
  • 6
  • 16
  • 1
    never heard of the saying *lapply inferior to map* all that I know of is the readability of map, the lambda function written with formula notation, and able to choose the return type (ie _dbl, _chr, _lgl, _df etc). Other than that, i do not see a difference between map and lapply. Note that this is a duplicated question (https://stackoverflow.com/questions/45101045/why-use-purrrmap-instead-of-lapply) – Onyambu Feb 07 '22 at 17:37
  • 1
    FYI, if you start with `mt <- do.call(rbind, replicate(10000, mtcars, simplify=FALSE))` (320K rows), then you start to see performance-parity, ala `bench::mark(lapply(mt, sd), purrr::map(mt, sd))`. Running it several times, I see it alternate(-ish) which is faster. When the data is 10x larger (3.2M rows), then `map` is a little faster much more of the time. To me, that suggests the performance payoff is in the millions or higher. – r2evans Feb 07 '22 at 18:03
  • I'm pretty sure your understanding is correct. – Ben Bolker Feb 07 '22 at 18:07

0 Answers0