R: Creating a function using dplyr functions

Question

I have a data frame with three variables of interest:

survival time
grouping factor
event indicator (dead: yes or no)

I want to calculate incidence rate for each group. I do this daily, so it would be great to have a function doing this instead of a long script.

I've tried the following, but doesn't work.

library(survival)
data(lung) # example data
lung$death <- ifelse(lung$status==1, 0, 1) # event indicator: 0 = survived; 1 = dead.

# Function
func <- function(data_frame, group, survival_time, event) {
     library(epitools)
     table <- data_frame %>%
          filter_(!is.na(.$group)) %>%
          group_by_(.$group) %>%
          summarise_(pt = round(sum(as.numeric(.$survival_time)/365.25)),
                     events = sum(.$event)) %>%
          do(pois.exact(.$events, pt = .$pt/1000, conf.level = 0.95)) %>%
          ungroup() %>%
          transmute_(Category = c(levels(as.factor(.$group))),
                     Events = x,
                     Person_years = pt*1000,
                     Incidence_Rate = paste(format(round(rate, 2), nsmall=2), " (",
                                      format(round(lower, 2), nsmall=2), " to ",
                                      format(round(upper, 2), nsmall=2), ")", 
                                      sep=""))
     return(table)
}

func(lung, sex, time, death)

**Error: incorrect length (0), expecting: 228 In addition: Warning message:
In is.na(.$group) : is.na() applied to non-(list or vector) of type 'NULL'**

Any ideas? I've read the post about NSE and SE in dplyr, but thought I applied the recommendations correctly?

`.$group` will look for `"group"` instead of the var you pass as `group`. Try `.[[group]]` (I suspect this still isn't the correct way to do SE in dplyr, but it might at least work). One other minor thing: you might not want to use `data_frame` as a var, since that is the name of a dplyr function. — Frank, Feb 29 '16 at 21:00
thanks for the suggestion @Frank. Did not work though. A bit trick, this whole SE thing.. =) — Adam Robinsson, Feb 29 '16 at 21:05
Did you really read `help("nse")`? Your code doesn't look like that. You seem to just use `.$` everywhere and the `_` (SE) functions. — talat, Feb 29 '16 at 21:05
This piping thing is out of hand when someone relies on them for simple function calls and create a mess in the meantime. — Pierre L, Feb 29 '16 at 21:05
but the piping is lovely for many other instances, although it might be a shortcoming in this instance =) — Adam Robinsson, Feb 29 '16 at 21:06
@PierreLafortune, not very constructive comment. Also, I doubt you really mean the piping aspect in it — talat, Feb 29 '16 at 21:07
@AdamRobinsson, here's an example showing the use of `lazyeval::interp` to build dplyr-based functions: http://stackoverflow.com/a/27975126/3521006 — talat, Feb 29 '16 at 21:30
thanks @docendo discimus - I obviously need to read a bit more about this. — Adam Robinsson, Feb 29 '16 at 21:39

score 2 · Accepted Answer · answered Feb 29 '16 at 21:16

2

Here is a part of the solution

data_frame = lung
group = "sex"
survival_time = "time"
event = "death"
data_frame %>%
  filter_(paste("!is.na(", group, ")")) %>%
  group_by_(group) %>%
  summarise_(
    pt = paste("round(sum(as.numeric(", survival_time, ") / 365.25))"),
    events = paste("sum(", event, ")")
  )

answered Feb 29 '16 at 21:16

Thierry

18,049
5
48
66

I think the "idiomatic" way would be to use `lazyeval::interp` instead of pasting strings together – talat Feb 29 '16 at 21:20

R: Creating a function using dplyr functions

1 Answers1