58

How do I achieve row-wise iteration using purrr::map?

Here's how I'd do it with a standard row-wise apply.

df <- data.frame(a = 1:10, b = 11:20, c = 21:30)

lst_result <- apply(df, 1, function(x){
            var1 <- (x[['a']] + x[['b']])
            var2 <- x[['c']]/2
            return(data.frame(var1 = var1, var2 = var2))
          })

However, this is not too elegant, and I would rather do it with purrr. May (or may not) be faster, too.

matsuo_basho
  • 2,833
  • 8
  • 26
  • 47

6 Answers6

70

You can use pmap for row-wise iteration. The columns are used as the arguments of whatever function you are using. In your example you would have a three-argument function.

For example, here is pmap using an anonymous function for the work you are doing. The columns are passed to the function in the order they are in the dataset.

pmap(df, function(a, b, c) {
     data.frame(var1 = a + b,
                var2 = c/2) 
     }  ) 

You can use the purrr tilde "short-hand" for an anonymous function by referring to the columns in order with numbers preceded by two dots.

pmap(df, ~data.frame(var1 = ..1 + ..2,
                var2 = ..3/2)  ) 

If you want to get these particular results as a data.frame instead of a list, you can use pmap_dfr.

aosmith
  • 34,856
  • 9
  • 84
  • 118
  • 3
    In the first example, what do I do if the df has 100 columns and I only want to manipulate the 90th one? I understand I can refer to it by index number, but I would like to refer to it by name. – matsuo_basho Oct 29 '17 at 18:34
  • 6
    @matsuo_basho If you only want to use a single column, other tools might be more appropriate (e.g., `dplyr::mutate`). However, the documentation for `pmap` points out that you can always use `...` to "absorb unused components of input [the] list". So if the column of interest was named "c", something like `pmap(df, function(c, ...) {data.frame(var1 = c/2) })` would work. – aosmith Nov 01 '17 at 23:22
  • what is `...` used for? – Alvaro Morales Apr 19 '21 at 03:05
  • 1
    @AlvaroMorales It takes all of the rest of the column names so you don't need to refer to every single column name in `pmap()`. There is an example in the documentation `Examples` section of the **map** family of functions that you might find useful! – aosmith Apr 19 '21 at 14:22
10

Note that you're using only vectorized operations in your example so you could very well do :

df %>% dplyr::transmute(var1 = a+b,var2 = c/2)

(or in base R: transform(df,var1 = a+b,var2 = c/2)[4:5])

If you use non vectorized functions such as median you can use pmap as in @aosmith 's answer, or use dplyr::rowwise.

rowwise is slower and the package maintainers advise to use the map family instead, but it's arguably easier on the eye than pmap in some cases. I personally still use it when speed isn't an issue:

library(dplyr)
df %>% transmute(var3 = pmap(.,~median(c(..1,..2,..3))))
df %>% rowwise %>% transmute(var3 = median(c(a,b,c)))

(to go back to a strict unnamed list output : res %>% split(seq(nrow(.))) %>% unname)

moodymudskipper
  • 46,417
  • 11
  • 121
  • 167
6

You are free to always make a wrapper around a function you "like".

rmap <- function (.x, .f, ...) {
    if(is.null(dim(.x))) stop("dim(X) must have a positive length")
    .x <- t(.x) %>% as.data.frame(.,stringsAsFactors=F)
    purrr::map(.x=.x,.f=.f,...)
}

apply the new function rmap (rowwisemap)

rmap(df1,~{
    var1 <- (.x[[1]] + .x[[2]])
    var2 <- .x[[3]]/2
    return(data.frame(var1 = var1, var2 = var2))
    })

Additional Info: (eval from top to bottom)

df1 <- data.frame(a=1:3,b=1:3,c=1:3)
m   <- matrix(1:9,ncol=3)

apply(df1,1,sum)
rmap(df1,sum)

apply(m,1,sum)
rmap(m,sum)

apply(1:10,1,sum)  # intentionally throws an error
rmap(1:10,sum)     # intentionally throws an error
Andre Elrico
  • 10,956
  • 6
  • 50
  • 69
4

You can use pmap and the ... in combination which for me is the best solution because I dont need to specify the parameters.

df <- data.frame(a = 1:10, b = 11:20, c = 21:30)

lst_result <- df %>%
   pmap(function(...) {
       x <- tibble(...)
      return(tibble(var1 = x$a + x$b, var2 = x$c/2))
   })
fmassica
  • 1,896
  • 3
  • 17
  • 22
4

You can also use group_nest() to access each row as a one-row-tibble:

library(tidyverse)
df <- data.frame(a = 1:10, b = 11:20, c = 21:30)

df %>% 
    group_nest(row_number()) %>% 
    pull(data) %>% 
    map(function(x) transmute(x,
                                 var1 = a + b,
                                 var2 = c/2))
Rasmus Larsen
  • 5,721
  • 8
  • 47
  • 79
1

I like (and upvoted) the group_nest answer by @rasmus-larsen, but I think it's cleaner to use group_by and group_map:

library(tidyverse)
df <- data.frame(a = 1:10, b = 11:20, c = 21:30)
lst_result <- df %>% 
  group_by(row_number()) %>%
  group_map(function(x, i) {
    x %>% transmute(
      var1 = a + b,
      var2 = c/2
    )
  })
jrosell
  • 1,445
  • 14
  • 20