4

I am a beginner trying to use dplyr for do data analysis. My data basically are from a few Operations ("Ops") and are well ordered. I often need to apply different functions to the observations("Num") according to the type of Operations, then combine them for analysis.

Trivial example is below:

  X      Num  Ops
  0       37   S
  1       18   R
  2       11   S
  3        3   R
  4       11   S
  5       13   R
  ...     ... ...

I want to add a new column "Num2", according to the values column "Ops", e.g.:

df %〉% mutate(Num2=ifelse(Ops="S",Num-1, Num+1))

I am not sure if I should do a lot of ifelse assignments -- it feels redundant and inefficient.

There must be a much better solution, maybe using some combinations of "group_by, select, filter". Any suggestions?

Basically I want to figure out if there is a way to group the data according to certain criteria, then apply different functions to different subsets, and finally merge the results back together. Typical dplyr examples I found apply the same function(s) to all subsets.

@eddi below provided a more general solution using data.table. Is there a dplyr equivalent?

Dong
  • 481
  • 4
  • 15
  • 1
    you can try the following approach: http://stackoverflow.com/a/19054962/817778 – eddi Mar 11 '15 at 14:41
  • Check [this](http://www.statmethods.net/management/variables.html), [this](http://rprogramming.net/recode-data-in-r/), and [this](http://www.cookbook-r.com/Manipulating_data/Recoding_data/) for ideas and possible alternative techniques. – JasonAizkalns Mar 11 '15 at 14:50
  • Thanks for the suggestions. Those are not exactly what I want. Basically I want to figure out if there is a way to group the data according to certain criteria, apply different functions to different subsets, then merge the results back together. Typical dplyr examples apply the same function(s) to all subsets. – Dong Mar 11 '15 at 17:57
  • @eddi It looks you indeed provide a more general solution with data.table. Is there a dplyr equivalent? – Dong Mar 13 '15 at 14:03
  • @Dong not sure, I'm not a `dplyr` expert – eddi Mar 13 '15 at 14:41
  • I found this on [google group manipulatr](https://groups.google.com/forum/#!topic/manipulatr/PpdENJFbAvw) The author of dplyr says "there's currently no great solution. See https://github.com/hadley/dplyr/issues/631 for more discussion. " – Dong Mar 23 '15 at 22:24

2 Answers2

1

There is a dplyrExtras package that includes a mutate_if function.

# install dplyrExtras
library(devtools)
install_github(repo="skranz/dplyrExtras")
require(dplyrExtras)
# code using mutate_if
df %>% 
  mutate(Num2 = Num+1) %>% 
  mutate_if(Ops=="S", Num2 = Num-1)
shadow
  • 21,823
  • 4
  • 63
  • 77
  • That seems to be wasteful. I want to do away with conditional operations after I do group_by(Ops). Possible? – Dong Mar 14 '15 at 05:36
0

You can easily avoid the ifelse for numeric return values. Just convert the condition to numeric and use appropriate numeric calculations.

df %>% mutate(Num2 = Num - 2*(Ops=="S") + 1)
shadow
  • 21,823
  • 4
  • 63
  • 77
  • I am looking for more general solutions. The functions are generally more complex and the group_by column has more than two values. – Dong Mar 11 '15 at 06:54