144

When working with plyr I often found it useful to use adply for scalar functions that I have to apply to each and every row.

e.g.

data(iris)
library(plyr)
head(
     adply(iris, 1, transform , Max.Len= max(Sepal.Length,Petal.Length))
    )
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species Max.Len
1          5.1         3.5          1.4         0.2  setosa     5.1
2          4.9         3.0          1.4         0.2  setosa     4.9
3          4.7         3.2          1.3         0.2  setosa     4.7
4          4.6         3.1          1.5         0.2  setosa     4.6
5          5.0         3.6          1.4         0.2  setosa     5.0
6          5.4         3.9          1.7         0.4  setosa     5.4

Now I'm using dplyr more, I'm wondering if there is a tidy/natural way to do this? As this is NOT what I want:

library(dplyr)
head(
     mutate(iris, Max.Len= max(Sepal.Length,Petal.Length))
    )
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species Max.Len
1          5.1         3.5          1.4         0.2  setosa     7.9
2          4.9         3.0          1.4         0.2  setosa     7.9
3          4.7         3.2          1.3         0.2  setosa     7.9
4          4.6         3.1          1.5         0.2  setosa     7.9
5          5.0         3.6          1.4         0.2  setosa     7.9
6          5.4         3.9          1.7         0.4  setosa     7.9
Arun
  • 116,683
  • 26
  • 284
  • 387
Stephen Henderson
  • 6,340
  • 3
  • 27
  • 33
  • i recently asked if there was an equivalent of `mdply` in dplyr, and hadley suggested that they might be brewing something based on `do`. I guess it would also work here. – baptiste Feb 17 '14 at 11:49
  • 4
    Eventually dplyr will have something like `rowwise()` which would group by each individual row – hadley Feb 17 '14 at 16:15
  • @hadley thx, shouldn't it just behave like `adply` when you don't use a grouping though? as its closely integrated function is called `group_by` NOT `split_by` – Stephen Henderson Feb 17 '14 at 17:24
  • @StephenHenderson no, because you also need some way to operate on the table as a whole. – hadley Feb 17 '14 at 17:50
  • Related post: [Call apply-like function on each row of dataframe with multiple arguments from each row](http://stackoverflow.com/questions/15059076) – zx8754 Jan 29 '16 at 10:16
  • I know this is a really old question, but iris %>% mutate(Max.Len=pmax(Sepal.Length,Sepal.Width) ) works also – HowYaDoing May 18 '16 at 19:14
  • 1
    @HowYaDoing Yes but that method doesn't generalise. There is no psum, pmean or pmedian for instance. – Stephen Henderson Sep 18 '17 at 13:00

8 Answers8

227

As of dplyr 0.2 (I think) rowwise() is implemented, so the answer to this problem becomes:

iris %>% 
  rowwise() %>% 
  mutate(Max.Len= max(Sepal.Length,Petal.Length))

Non rowwise alternative

Five years (!) later this answer still gets a lot of traffic. Since it was given, rowwise is increasingly not recommended, although lots of people seem to find it intuitive. Do yourself a favour and go through Jenny Bryan's Row-oriented workflows in R with the tidyverse material to get a good handle on this topic.

The most straightforward way I have found is based on one of Hadley's examples using pmap:

iris %>% 
  mutate(Max.Len= purrr::pmap_dbl(list(Sepal.Length, Petal.Length), max))

Using this approach, you can give an arbitrary number of arguments to the function (.f) inside pmap.

pmap is a good conceptual approach because it reflects the fact that when you're doing row wise operations you're actually working with tuples from a list of vectors (the columns in a dataframe).

alexwhan
  • 15,636
  • 5
  • 52
  • 66
  • I've changed this (from the above) to the ideal answer as I think this is the intended usage. – Stephen Henderson Sep 23 '14 at 14:04
  • 2
    is it possible to add the values of a dynamically formed datatframe? So in this data frame the column names are not known. I am able to add if column names are known. – Arun Raja May 04 '15 at 07:49
  • http://stackoverflow.com/questions/28807266/row-wise-cor-on-subset-of-columns-using-dplyrmutate just found the answer. In this they are using correlation instead of sum. But same concept. – Arun Raja May 04 '15 at 10:07
  • 14
    If it does not work, make sure you are actually using dplyr::mutate not plyr::mutate - drove me nuts – jan-glx Aug 03 '15 at 13:38
  • Thanks YAK, this bit me too. If you include both `plyr` and `dplyr` packages, you are almost certainly using the wrong `mutate` unless you explicitly provide scope `dplyr::mutate`. – Chris Warth Sep 08 '15 at 04:30
  • Be cautious using this approach as `mutate()` can slow down dramatically given large numbers of groups. Groups in the order of 10^5 can mean mutation takes minutes instead of microseconds. See [this gist](https://gist.github.com/MilesMcBain/d08cc4f675e00edf806b7f5f20f29ea2). – MilesMcBain Jul 06 '16 at 00:31
  • If you are reading this for the future: **consider whether your problem can be vectorised first**... For example here, there is already a vectorised `max` function called `pmax`, so using `rowwise` is doubly inefficent: more to type and more time to process. See mnel's answer below for the general case when no vectorised function readily exists. – asachet Dec 02 '16 at 12:25
  • It does not work with the output of the function to be apply by row is something more complex like a list. What is suggested to use in that case? – Erick Chacon Jun 30 '17 at 23:27
  • @ErickChacon I recommend asking that as a separate question – alexwhan Jul 02 '17 at 23:23
  • Thanks so much - I've been banging my head against this issue for a while. It does seem odd that a) this isnt the default behaviour for something built for data frames? b) why would it be slower - shouldn't rowwise mean less processing? – Mike Honey Sep 19 '17 at 14:43
  • This function is currently in "questioning" stage of lifecycle. https://www.tidyverse.org/lifecycle/#questioning – qwr Jun 20 '19 at 22:25
  • you do not need `rowwise` to `mutate(Max.Len= max(Sepal.Length,Petal.Length))`, and it will also have the side effect of grouping and outputting a tibble, instead of a data.frame. – d8aninja Jul 02 '19 at 18:07
  • @d8aninja have you compared the output with and without? It makes a big different to what `max` returns – alexwhan Jul 03 '19 at 03:45
  • @alexwhan Hmm.. I'm really not liking the `purrr::pmap_dbl` method. It's a real conceptual leap. Wouldn't want to teach that in software carpentry ;) – Stephen Henderson Aug 14 '19 at 10:31
  • @StephenHenderson agreed it's a challenge. I've now really come around to it conceptually, but it depends on an understanding of what's actually happening. `rowwise` feels very natural to new users in my experience – alexwhan Aug 26 '19 at 04:06
  • 1
    I'm sure this answer will continue to get lots of traffic...It's worth mentioning, as of mid-2020, `rowwise()` is now accepted (and recommended) in dplyr 1.0.0, and will likely stick around. Hadley was convinced of it's utility and intuitiveness over the purrr solutions. See the [official documentation](https://dplyr.tidyverse.org/reference/rowwise.html) – lioumens May 11 '21 at 19:20
  • 1
    How to give the names of the variables in a *tidyselect* way for the `pmap` approach? – Julien Sep 29 '22 at 07:18
  • "although lots of people seem to find it intuitive" <-- It's R's mission to do the opposite. – Jonh Smith Nov 29 '22 at 12:07
24

The idiomatic approach will be to create an appropriately vectorised function.

R provide pmax which is suitable here, however it also provides Vectorize as a wrapper for mapply to allow you to create a vectorised arbitrary version of an arbitrary function.

library(dplyr)
# use base R pmax (vectorized in C)
iris %>% mutate(max.len = pmax(Sepal.Length, Petal.Length))
# use vectorize to create your own function
# for example, a horribly inefficient get first non-Na value function
# a version that is not vectorized
coalesce <- function(a,b) {r <- c(a[1],b[1]); r[!is.na(r)][1]}
# a vectorized version
Coalesce <- Vectorize(coalesce, vectorize.args = c('a','b'))
# some example data
df <- data.frame(a = c(1:5,NA,7:10), b = c(1:3,NA,NA,6,NA,10:8))
df %>% mutate(ab =Coalesce(a,b))

Note that implementing the vectorization in C / C++ will be faster, but there isn't a magicPony package that will write the function for you.

Jaap
  • 81,064
  • 34
  • 182
  • 193
mnel
  • 113,303
  • 27
  • 265
  • 254
  • thx, this is a great answer, is excellent general R style -idiomatic as you say, but I don't think its really addressing my question whether there is a `dplyr` way... as it would be simpler without dplyr e.g. `with(df, Coalesce(a,b))` Perhaps, that's a kind of answer though - don't use `dplyr` for that? – Stephen Henderson Feb 17 '14 at 12:30
  • 5
    Have to admit I double checked that there isn't a ```magicPony``` package. Too bad – rsoren Mar 30 '16 at 08:31
22

You need to group by row:

iris %>% group_by(1:n()) %>% mutate(Max.Len= max(Sepal.Length,Petal.Length))

This is what the 1 did in adply.

Jaap
  • 81,064
  • 34
  • 182
  • 193
BrodieG
  • 51,669
  • 9
  • 93
  • 146
  • 1
    It seems like there should be a simpler or "nicer" syntax. – Stephen Henderson Feb 16 '14 at 23:33
  • @StephenHenderson, there may be, I'm not a `dplyr` expert. Hopefully someone else will come along with something better. Note I cleaned it up a bit with `1:n()`. – BrodieG Feb 16 '14 at 23:35
  • I suspect you are right, but I sort of feel like the default behaviour with no grouping should be like the `group_by(1:n())` behaviour. If no-one has any other ideas in the morning I'll tick yours ;) – Stephen Henderson Feb 16 '14 at 23:40
  • Also, note that this is somewhat in contravention of documentation for `n`: "This function is implemented special for each data source and can only be used from within summarise.", though it seems to work. – BrodieG Feb 16 '14 at 23:42
  • Can you refer to Sepal.Length and Petal.Length by their index number in some way? If you have lots of variables did would be handy. Like ... Max.len = max( [c(1,3)] ) ? – Rasmus Larsen Jun 19 '14 at 07:43
19

Update 2017-08-03

After writing this, Hadley changed some stuff again. The functions that used to be in purrr are now in a new mixed package called purrrlyr, described as:

purrrlyr contains some functions that lie at the intersection of purrr and dplyr. They have been removed from purrr in order to make the package lighter and because they have been replaced by other solutions in the tidyverse.

So, you will need to install + load that package to make the code below work.

Original post

Hadley frequently changes his mind about what we should use, but I think we are supposed to switch to the functions in purrr to get the by row functionality. At least, they offer the same functionality and have almost the same interface as adply from plyr.

There are two related functions, by_row and invoke_rows. My understanding is that you use by_row when you want to loop over rows and add the results to the data.frame. invoke_rows is used when you loop over rows of a data.frame and pass each col as an argument to a function. We will only use the first.

Examples

library(tidyverse)

iris %>% 
  by_row(..f = function(this_row) {
    browser()
  })

This lets us see the internals (so we can see what we are doing), which is the same as doing it with adply.

Called from: ..f(.d[[i]], ...)
Browse[1]> this_row
# A tibble: 1 × 5
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
         <dbl>       <dbl>        <dbl>       <dbl>  <fctr>
1          5.1         3.5          1.4         0.2  setosa
Browse[1]> Q

By default, by_row adds a list column based on the output:

iris %>% 
  by_row(..f = function(this_row) {
      this_row[1:4] %>% unlist %>% mean
  })

gives:

# A tibble: 150 × 6
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species      .out
          <dbl>       <dbl>        <dbl>       <dbl>  <fctr>    <list>
1           5.1         3.5          1.4         0.2  setosa <dbl [1]>
2           4.9         3.0          1.4         0.2  setosa <dbl [1]>
3           4.7         3.2          1.3         0.2  setosa <dbl [1]>
4           4.6         3.1          1.5         0.2  setosa <dbl [1]>
5           5.0         3.6          1.4         0.2  setosa <dbl [1]>
6           5.4         3.9          1.7         0.4  setosa <dbl [1]>
7           4.6         3.4          1.4         0.3  setosa <dbl [1]>
8           5.0         3.4          1.5         0.2  setosa <dbl [1]>
9           4.4         2.9          1.4         0.2  setosa <dbl [1]>
10          4.9         3.1          1.5         0.1  setosa <dbl [1]>
# ... with 140 more rows

if instead we return a data.frame, we get a list with data.frames:

iris %>% 
  by_row( ..f = function(this_row) {
    data.frame(
      new_col_mean = this_row[1:4] %>% unlist %>% mean,
      new_col_median = this_row[1:4] %>% unlist %>% median
    )
  })

gives:

# A tibble: 150 × 6
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species                 .out
          <dbl>       <dbl>        <dbl>       <dbl>  <fctr>               <list>
1           5.1         3.5          1.4         0.2  setosa <data.frame [1 × 2]>
2           4.9         3.0          1.4         0.2  setosa <data.frame [1 × 2]>
3           4.7         3.2          1.3         0.2  setosa <data.frame [1 × 2]>
4           4.6         3.1          1.5         0.2  setosa <data.frame [1 × 2]>
5           5.0         3.6          1.4         0.2  setosa <data.frame [1 × 2]>
6           5.4         3.9          1.7         0.4  setosa <data.frame [1 × 2]>
7           4.6         3.4          1.4         0.3  setosa <data.frame [1 × 2]>
8           5.0         3.4          1.5         0.2  setosa <data.frame [1 × 2]>
9           4.4         2.9          1.4         0.2  setosa <data.frame [1 × 2]>
10          4.9         3.1          1.5         0.1  setosa <data.frame [1 × 2]>
# ... with 140 more rows

How we add the output of the function is controlled by the .collate param. There's three options: list, rows, cols. When our output has length 1, it doesn't matter whether we use rows or cols.

iris %>% 
  by_row(.collate = "cols", ..f = function(this_row) {
    this_row[1:4] %>% unlist %>% mean
  })

iris %>% 
  by_row(.collate = "rows", ..f = function(this_row) {
    this_row[1:4] %>% unlist %>% mean
  })

both produce:

# A tibble: 150 × 6
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species  .out
          <dbl>       <dbl>        <dbl>       <dbl>  <fctr> <dbl>
1           5.1         3.5          1.4         0.2  setosa 2.550
2           4.9         3.0          1.4         0.2  setosa 2.375
3           4.7         3.2          1.3         0.2  setosa 2.350
4           4.6         3.1          1.5         0.2  setosa 2.350
5           5.0         3.6          1.4         0.2  setosa 2.550
6           5.4         3.9          1.7         0.4  setosa 2.850
7           4.6         3.4          1.4         0.3  setosa 2.425
8           5.0         3.4          1.5         0.2  setosa 2.525
9           4.4         2.9          1.4         0.2  setosa 2.225
10          4.9         3.1          1.5         0.1  setosa 2.400
# ... with 140 more rows

If we output a data.frame with 1 row, it matters only slightly which we use:

iris %>% 
  by_row(.collate = "cols", ..f = function(this_row) {
    data.frame(
      new_col_mean = this_row[1:4] %>% unlist %>% mean,
      new_col_median = this_row[1:4] %>% unlist %>% median
      )
  })

iris %>% 
  by_row(.collate = "rows", ..f = function(this_row) {
    data.frame(
      new_col_mean = this_row[1:4] %>% unlist %>% mean,
      new_col_median = this_row[1:4] %>% unlist %>% median
    )
  })

both give:

# A tibble: 150 × 8
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species  .row new_col_mean new_col_median
          <dbl>       <dbl>        <dbl>       <dbl>  <fctr> <int>        <dbl>          <dbl>
1           5.1         3.5          1.4         0.2  setosa     1        2.550           2.45
2           4.9         3.0          1.4         0.2  setosa     2        2.375           2.20
3           4.7         3.2          1.3         0.2  setosa     3        2.350           2.25
4           4.6         3.1          1.5         0.2  setosa     4        2.350           2.30
5           5.0         3.6          1.4         0.2  setosa     5        2.550           2.50
6           5.4         3.9          1.7         0.4  setosa     6        2.850           2.80
7           4.6         3.4          1.4         0.3  setosa     7        2.425           2.40
8           5.0         3.4          1.5         0.2  setosa     8        2.525           2.45
9           4.4         2.9          1.4         0.2  setosa     9        2.225           2.15
10          4.9         3.1          1.5         0.1  setosa    10        2.400           2.30
# ... with 140 more rows

except that the second has the column called .row and the first does not.

Finally, if our output is longer than length 1 either as a vector or as a data.frame with rows, then it matters whether we use rows or cols for .collate:

mtcars[1:2] %>% by_row(function(x) 1:5)
mtcars[1:2] %>% by_row(function(x) 1:5, .collate = "rows")
mtcars[1:2] %>% by_row(function(x) 1:5, .collate = "cols")

produces, respectively:

# A tibble: 32 × 3
     mpg   cyl      .out
   <dbl> <dbl>    <list>
1   21.0     6 <int [5]>
2   21.0     6 <int [5]>
3   22.8     4 <int [5]>
4   21.4     6 <int [5]>
5   18.7     8 <int [5]>
6   18.1     6 <int [5]>
7   14.3     8 <int [5]>
8   24.4     4 <int [5]>
9   22.8     4 <int [5]>
10  19.2     6 <int [5]>
# ... with 22 more rows

# A tibble: 160 × 4
     mpg   cyl  .row  .out
   <dbl> <dbl> <int> <int>
1     21     6     1     1
2     21     6     1     2
3     21     6     1     3
4     21     6     1     4
5     21     6     1     5
6     21     6     2     1
7     21     6     2     2
8     21     6     2     3
9     21     6     2     4
10    21     6     2     5
# ... with 150 more rows

# A tibble: 32 × 7
     mpg   cyl .out1 .out2 .out3 .out4 .out5
   <dbl> <dbl> <int> <int> <int> <int> <int>
1   21.0     6     1     2     3     4     5
2   21.0     6     1     2     3     4     5
3   22.8     4     1     2     3     4     5
4   21.4     6     1     2     3     4     5
5   18.7     8     1     2     3     4     5
6   18.1     6     1     2     3     4     5
7   14.3     8     1     2     3     4     5
8   24.4     4     1     2     3     4     5
9   22.8     4     1     2     3     4     5
10  19.2     6     1     2     3     4     5
# ... with 22 more rows

So, bottom line. If you want the adply(.margins = 1, ...) functionality, you can use by_row.

CoderGuy123
  • 6,219
  • 5
  • 59
  • 89
  • 2
    `by_row` is deprecated, calling it says to "use a combination of: tidyr::nest(); dplyr::mutate(); purrr::map()" https://github.com/hadley/purrrlyr/blob/eef6685db8841a246da28f0eb5e3ca1e9536cb8a/R/rows.R#L184 – momeara Feb 20 '18 at 16:16
  • 1
    That's a lot of r's. – qwr Jun 20 '19 at 22:23
14

Extending BrodieG's answer,

If the function returns more than one row, then instead of mutate(), do() must be used. Then to combine it back together, use rbind_all() from the dplyr package.

In dplyr version dplyr_0.1.2, using 1:n() in the group_by() clause doesn't work for me. Hopefully Hadley will implement rowwise() soon.

iris %>%
    group_by(1:nrow(iris)) %>%
    do(do_fn) %>%
    rbind_all()

Testing the performance,

library(plyr)    # plyr_1.8.4.9000
library(dplyr)   # dplyr_0.8.0.9000
library(purrr)   # purrr_0.2.99.9000
library(microbenchmark)

d1_count <- 1000
d2_count <- 10

d1 <- data.frame(a=runif(d1_count))

do_fn <- function(row){data.frame(a=row$a, b=runif(d2_count))}
do_fn2 <- function(a){data.frame(a=a, b=runif(d2_count))}

op <- microbenchmark(
        plyr_version = plyr::adply(d1, 1, do_fn),
        dplyr_version = d1 %>%
            dplyr::group_by(1:nrow(d1)) %>%
            dplyr::do(do_fn(.)) %>%
            dplyr::bind_rows(),
        purrr_version = d1 %>% purrr::pmap_dfr(do_fn2),
        times=50)

it has the following results:

Unit: milliseconds
          expr       min        lq      mean    median        uq       max neval
  plyr_version 1227.2589 1275.1363 1317.3431 1293.5759 1314.4266 1616.5449    50
 dplyr_version  977.3025 1012.6340 1035.9436 1025.6267 1040.5882 1449.0978    50
 purrr_version  609.5790  629.7565  643.8498  644.2505  656.1959  686.8128    50

This shows that the new purrr version is the fastest

momeara
  • 1,341
  • 2
  • 17
  • 29
2

In addition to the great answer provided by @alexwhan, please keep in mind that you need to use ungroup() to avoid side effects. This is because rowwise() is a grouping operation.

iris %>%
    rowwise() %>%
    mutate(Max.Len = max(Sepal.Length, Petal.Length))

will give you:

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species Max.Len
          <dbl>       <dbl>        <dbl>       <dbl> <fct>     <dbl>
 1          5.1         3.5          1.4         0.2 setosa      5.1
 2          4.9         3            1.4         0.2 setosa      4.9
 3          4.7         3.2          1.3         0.2 setosa      4.7
 4          4.6         3.1          1.5         0.2 setosa      4.6
 5          5           3.6          1.4         0.2 setosa      5  
 6          5.4         3.9          1.7         0.4 setosa      5.4
 7          4.6         3.4          1.4         0.3 setosa      4.6
 8          5           3.4          1.5         0.2 setosa      5  
 9          4.4         2.9          1.4         0.2 setosa      4.4
10          4.9         3.1          1.5         0.1 setosa      4.9

Now let's assume that you need to continue with the dplyr pipe to add a lead to Max.Len:

iris %>%
    rowwise() %>%
    mutate(Max.Len = max(Sepal.Length, Petal.Length)) %>%
    mutate(Lead.Max.Len = lead(Max.Len))

This will produce:

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species Max.Len Lead.Max.Len
          <dbl>       <dbl>        <dbl>       <dbl> <fct>     <dbl>        <dbl>
 1          5.1         3.5          1.4         0.2 setosa      5.1           NA
 2          4.9         3            1.4         0.2 setosa      4.9           NA
 3          4.7         3.2          1.3         0.2 setosa      4.7           NA
 4          4.6         3.1          1.5         0.2 setosa      4.6           NA
 5          5           3.6          1.4         0.2 setosa      5             NA
 6          5.4         3.9          1.7         0.4 setosa      5.4           NA
 7          4.6         3.4          1.4         0.3 setosa      4.6           NA
 8          5           3.4          1.5         0.2 setosa      5             NA
 9          4.4         2.9          1.4         0.2 setosa      4.4           NA
10          4.9         3.1          1.5         0.1 setosa      4.9           NA

NA's are produced as a side effect. This can be corrected with ungroup():

iris %>%
    rowwise() %>%
    mutate(Max.Len = max(Sepal.Length, Petal.Length)) %>%
    ungroup() %>%
    mutate(Lead.Max.Len = lead(Max.Len))

This will produce the desired output:

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species Max.Len lead.max.len
          <dbl>       <dbl>        <dbl>       <dbl> <fct>     <dbl>        <dbl>
 1          5.1         3.5          1.4         0.2 setosa      5.1          4.9
 2          4.9         3            1.4         0.2 setosa      4.9          4.7
 3          4.7         3.2          1.3         0.2 setosa      4.7          4.6
 4          4.6         3.1          1.5         0.2 setosa      4.6          5  
 5          5           3.6          1.4         0.2 setosa      5            5.4
 6          5.4         3.9          1.7         0.4 setosa      5.4          4.6
 7          4.6         3.4          1.4         0.3 setosa      4.6          5  
 8          5           3.4          1.5         0.2 setosa      5            4.4
 9          4.4         2.9          1.4         0.2 setosa      4.4          4.9
10          4.9         3.1          1.5         0.1 setosa      4.9          5.4
yal-devi
  • 46
  • 3
1

Something like this?

iris$Max.Len <- pmax(iris$Sepal.Length, iris$Petal.Length)
colcarroll
  • 3,632
  • 17
  • 25
  • 1
    Yes thx, that's a very specific answer. But my example and question are trying to tease out if there is a general `dplyr` solution for any scalar function. – Stephen Henderson Feb 16 '14 at 23:26
  • In general, functions should be vectorized -- if it is a wacky function, you might write `wacky.function <- function(col.1, col.2){...}`, and then `iris.wacky <- wacky.function(iris$Sepal.Length, iris$Petal.Length)`. – colcarroll Feb 16 '14 at 23:32
  • Often they should I guess, but I think when you are using something like `dplyr` or `plyr` or say `data.table` you should try to use their idioms so that your code doesn't become a difficult to share mix of styles. Hence the question. – Stephen Henderson Feb 16 '14 at 23:39
  • The first line of the `plyr` documentation is "plyr is a set of tools that solves a common set of problems: you need to break a big problem down into manageable pieces, operate on each pieces and then put all the pieces back together." This seems like a very different problem for which elementary column operations are the best tool. This also might explain why there's no "natural" `plyr`/`dplyr` command for doing this. – colcarroll Feb 16 '14 at 23:43
  • Uh OK.. but plyr does have a natural way to do this `adply` (see above). – Stephen Henderson Feb 16 '14 at 23:47
  • 5
    To butcher a famous quote: "*If all you have is a plyr you will end up using it for a hammer and a screwdriver too*" – thelatemail Feb 16 '14 at 23:51
0

Just for completeness I am going to change the code of this user from the forgotten answer (and maybe the best answer) of the question: Sum across multiple columns. And apply it to your problem:

iris %>%
  mutate(max = select(.,c('Sepal.Length','Petal.Length')) %>% 
  apply(1, max, na.rm=TRUE))

The Result is expected. Accepted answer said that rowwise is increasingly not recommended, and apply is base R. Uou don't need to import an extra package like purrr.

You can use apply() function with max, min, sum, median, mean. So it's very handy and simple.

rubengavidia0x
  • 501
  • 1
  • 5
  • 18