dplyr crashes when using summarise with segfault error

Question

my dplyr Script sometimes crashes in this code segment:

abc.fit <- abc_bySubject %>%
  do(fit = lm(value ~ delta, .)) %>%
  summarise(fvc_intercept = coef(fit)[1],
        fvc_slope = coef(fit)[2])

the crash error is:

 *** caught segfault ***
address 0x7ff041000098, cause 'memory not mapped'

However, it also occur when I execute this part in Rstudio with error fatal error - R Session Aborted , but less frequently. It always happens when I source the Script in the R command line. I tested it on different machines with lots of RAM. R and all packages are uptodate and I'm using the latest version of Ubuntu.

It may be related to this question: link but it says this is fixed.

Perhaps there is a nicer solution

Without the data (simulated please) this is almost impossible to debug. Can you make a minimal working example? — Roman Luštrik, Aug 10 '15 at 09:07
I've been having problems with `dplyr-0.4.2` on my setup, so I'm using 0.4.1 with no problems. Can you reproduce the error if you [manually downgrade](http://stackoverflow.com/questions/17082341/installing-older-version-of-r-package) your `dplyr` version? — r2evans, Aug 10 '15 at 09:13
@r2evans I really don't like downgrading, do you know if this still happens with the dev version on github? — spore234, Aug 10 '15 at 09:20
Not certain; it's reported as fixed in [1302](https://github.com/hadley/dplyr/issues/1302) but I'm having trouble compiling it (different than [1306](https://github.com/hadley/dplyr/issues/1306)). I tried earlier with 0.4.2.9002 and it was not fixed, but that was a few weeks ago. — r2evans, Aug 10 '15 at 16:09

akrun · Accepted Answer · 2015-08-10T09:43:51.213

Another option without using summarise (the OP's code works in dplyr_0.4.1.9000) to get the expected output would be extracting the coef from lm, convert it to list, change the 'names' of list elements (setNames) and convert back to data.frame within the do environment.

library(dplyr)
abc.fit <- abc_bySubject %>%
                do(data.frame(setNames(as.list(coef(lm(value~delta, data=.))),
                            c('fvc_intercept','fvc_slope' ))))

abc.fit
#    Subject fvc_intercept   fvc_slope
#1       1     0.5319503 -0.03147698
#2       2     0.4478791  0.04293860
#3       3     0.4318059 -0.03276570

If we need to delete the 'Subject' column, we can ungroup() and use select to select columns other than 'Subject'

abc.fit %>% 
      ungroup() %>%
      select(-Subject)
#  fvc_intercept   fvc_slope
#1     0.5319503 -0.03147698
#2     0.4478791  0.04293860
#3     0.4318059 -0.03276570

Another option would be data.table. We convert the 'data.frame' to 'data.table' (setDT(abc)), grouped by the 'Subject' column, we get the coefficients (coef) of lm, convert to list (as.list) and set the names of the columns (setnames).

 library(data.table)
 res <- setnames(setDT(abc)[, as.list(coef(lm(value~delta))),
               by =Subject],2:3, c('fvc_intercept', 'fvc_slope'))[]
 res
 #   Subject fvc_intercept   fvc_slope
 #1:       1     0.5319503 -0.03147698
 #2:       2     0.4478791  0.04293860
 #3:       3     0.4318059 -0.03276570

We can subset the columns of interest from 'res'

res[,-1, with=FALSE]
#   fvc_intercept   fvc_slope
#1:     0.5319503 -0.03147698
#2:     0.4478791  0.04293860
#3:     0.4318059 -0.03276570

data

set.seed(24)
abc <- data.frame(Subject= rep(1:3,each=10), delta=rnorm(30), value=runif(30))
abc_bySubject <- group_by(abc, Subject)

thanks, this does indeed seem to fix ist. However it feels that this is a bit slower and it also includes the ID column of abc_SubjectID which I have to delete manually. This is only a minor issue so I'm happy with this solution. — spore234, Aug 10 '15 at 09:21
@spore234 I added the `data.table` option. It should be fast. — akrun, Aug 10 '15 at 09:30

dplyr crashes when using summarise with segfault error

1 Answers1

data

Linked