0

I am beginning programming in R and I have not found the solution to this problem.

I have data saved in a dataframe as displayed below :

        Material created_date
1    50890000   29/10/2018
2    50890000   17/10/2018
3    50890000   31/05/2018
4    50890000   08/02/2018
5    50890000   09/01/2018
6    50900000   21/12/2018
7    50900000   27/09/2018
8    50900000   24/08/2018
9    50900000   18/05/2018
10   51200000   13/07/2018
11   51210001   08/08/2018
12   51210001   26/07/2018
13   51210001   27/02/2018
14   51210001   17/01/2018
15   51210001   09/01/2018
16   51210002   29/08/2018
17   51210002   08/08/2018
18   51210002   13/04/2018

I would like to calculate 4 columns :

  • Average difference between consecutive dates in days
  • Standard deviation associated
  • Average difference between consecutive dates in working days
  • Standard deviation associated

I have been told to used plyr or dplyr but as I am beginning I am not sure how to compute the desired output.

Thank you,

s__
  • 9,270
  • 3
  • 27
  • 45
  • Welcome to SO! Please, share some attempts, to help you. – s__ Jan 04 '19 at 14:33
  • Check out this question on calculating [number of weekdays](https://stackoverflow.com/questions/5046708/calculate-the-number-of-weekdays-between-2-dates-in-r) – astrofunkswag Jan 04 '19 at 22:15

2 Answers2

0

First, you will need to change created_date to a date that R understands. Do that with:

df$R_date <- as.Date(df$created_date, "%d/%m/%Y")

Now, if you simply want to calculate the difference between dates, a loop (shunned by many) can work:

for (i in 2:nrow(df)) {
  df$date_diff[i] <- as.integer(df$R_date[i]-df$R_date[i-1])
}

However, seeing your reference to dplyr I wonder if you want to do this for each Material group...

Marc
  • 11
  • 3
0

Here's the dplyr approach to the first two of your bullet pointed questions:

df <- df %>% 
  mutate(
    created_date = as.Date(created_date, "%d/%m/%Y"),
    diff = as.integer(created_date - lag(created_date)))
df %>% 
  summarise(n = n(), mval = mean(diff, na.rm = T), std = sd(diff, na.rm = T))

   n      mval      std
1 18 -11.70588 128.4916

Check out the link in the comments I left you about the number of workdays, and try to combine these methods to answer your second two bullets

astrofunkswag
  • 2,608
  • 12
  • 25