dplyr crash when using lagged difference computation

Asked Jul 24 '15 at 13:35

Active Jul 27 '15 at 18:48

Viewed 111 times

I have a data frame 4 million rows and 1.4 million distinct values of a grouping variable. Sample DF looks like this:

> df
        date        id
1 2015-06-25   4333864
2 2015-06-25   3867895
3 2015-06-25   4333866
4 2015-06-25   4333868
5 2015-06-29   2900522
6 2015-06-29   3609093

Using this command to do lagged date differences crashes R on an 8GB memory MAC:

df %>% group_by(id) %>% mutate(dayDiff = date - lag(date))

Is this dplyr being memory hungry? Any other efficient way to accomplish what I need?

Here is the version of dplyr I am using:

Package: dplyr
Type: Package
Version: 0.4.1

Date frame has the following variable types:

> str(df)
'data.frame':   6 obs. of  2 variables:
 $ date: Date, format: "2014-07-01" "2014-07-01" "2014-07-01" ...
 $ id  : num  1793096 2019424 1869572 1869573 1774661 ...

edited Jul 27 '15 at 18:48

asked Jul 24 '15 at 13:35

Gopala

10,363
7
45
77

3

Try changing the -> into a pipe operator – talat Jul 24 '15 at 13:42
Sorry, that was just a typo in my post. Nothing to do with the crash. – Gopala Jul 24 '15 at 15:04
Can you show us the output of `str(df)`? (Add it to the question, not in comments please) – talat Jul 24 '15 at 19:00
1

And what version of dplyr are you using? – talat Jul 24 '15 at 19:28

dplyr crash when using lagged difference computation

0 Answers0

Linked