-1

I have a dataframe with an ID and date column. I wish to calculate the days difference between the date1 and next date for a group.

I have already tried with dplyr package , it seems wrong.

hist_trnx1 %>% group_by(card_id) %>% mutate(gap=round(c(NA,diff(purchase_date)), 1))

I would like to get the results as below

   Card_ID         date                  Diff   
1. C_ID_4e6213e9bc 2017-06-25 15:33:07   NA
2: C_ID_4e6213e9bc 2017-07-15 12:10:45   20
3: C_ID_4e6213e9bc 2017-08-09 22:04:29   34 
4: C_ID_4e6213e9bB 2017-03-10 10:06:26   NA #( Because of group change) 
5: C_ID_4e6213e9bB 2017-04-10 01:14:19   30 
6: C_ID_4e6213e9bD 2018-02-24 08:45:05   NA #( Because of group change )
7: C_ID_4e6213e9bD 2018-03-23 08:45:05   29

data

structure(list(card_id = c("C_ID_4e6213e9bc", "C_ID_4e6213e9bc", 
"C_ID_4e6213e9bc", "C_ID_4e6213e9bc", "C_ID_4e6213e9bc", "C_ID_4e6213e9bc"
), purchase_date = structure(c(1498404787, 1500120645, 1502316269, 
1504346786, 1489108459, 1519461905), tzone = "UTC", class = c("POSIXct", 
"POSIXt"))), .Names = c("card_id", "purchase_date"), class = c("data.table", 
"data.frame"), row.names = c(NA, -6L))
markus
  • 25,843
  • 5
  • 39
  • 58
Lenin S
  • 23
  • 1
  • 5
  • 1
    What is the expected output. Here, I don't see any difference in the 'Card_ID' in your dput – akrun Jan 17 '19 at 11:50
  • My bad .. Here is the correct one – Lenin S Jan 17 '19 at 11:56
  • structure(list(card_id = c("C_ID_4e6213e9bc", "C_ID_4e6213e9bc", "C_ID_4e6213e9bB", "C_ID_4e6213e9B", "C_ID_4e6213e9bD", "C_ID_4e6213e9bD" ), purchase_date = structure(c(1498404787, 1500120645, 1502316269, 1504346786, 1489108459, 1519461905), tzone = "UTC", class = c("POSIXct", "POSIXt"))), .Names = c("card_id", "purchase_date"), class = c("data.table", "data.frame"), row.names = c(NA, -6L), .internal.selfref = ) – Lenin S Jan 17 '19 at 11:56
  • Please update your question with the correct data needed to make your problem reproducible. – erc Jan 17 '19 at 13:19

1 Answers1

6

I'm not sure this is the prettiest way to go, and someone will probably offer a cleaner solution, but this should work (part of the solution found at : subtract value from previous row by group )

First of all, I import your data :

df <- structure(list(card_id = c("C_ID_4e6213e9bc", "C_ID_4e6213e9bc", "C_ID_4e6213e9bB", "C_ID_4e6213e9B", 
                                  "C_ID_4e6213e9bD", "C_ID_4e6213e9bD" ), 
                      purchase_date = structure(c(1498404787, 1500120645, 1502316269, 1504346786, 1489108459, 1519461905), 
                                                tzone = "UTC", class = c("POSIXct", "POSIXt"))), 
                 .Names = c("card_id", "purchase_date"), class = c("data.table", "data.frame"), 
                 row.names = c(NA, -6L))

And then it works when I run :

df <- df %>%
  group_by(card_id) %>%
  arrange(purchase_date) %>%
  mutate(diff = purchase_date - lag(purchase_date, default = first(purchase_date))) %>%
  mutate(diff = round(diff/86400, digits = 2))

The arrange allows you to be sure you're substracting what you want to substract, then the lag function allows you to choose the previous row, and finally the division returns the number of days spent.

I hope that helps you =)

Emmanuel Daveau
  • 344
  • 1
  • 2
  • 9