0

I have a dataframe that looks like this:

Name Date

David 2019-12-23

David 2020-1-10

David 2020-2-13

Kevin 2019-2-12

Kevin 2019-3-19

Kevin 2019-5-1

Kevin 2019-7-23

Basically, I'm trying to calculate the date difference between each instance, specific to each person. I am currently using the following for-loop:

df$daysbetween <- with(df, ave(as.numeric(date) , name, 
              FUN=function(x) { z=c(NA,NA); 
                            for( i in seq_along(x)[-(1:2)] ){
                                z <- c(z, (x[i]-x[i-1]))}
                            return(z) }) )

Currently, it calculates the difference between the second and third, and any following instance, perfectly fine. However, it doesn't calculate the difference between the first and second date and I need it to. Where is the error in my code coming from? Would appreciate any help.

  • 1
    Is this the real data? If those were `Date` objects, I would expect zero-padded months, so this suggests they are either `character` or `factor`, neither of which will do date-math correctly. (Also, you reference `name` and `date` but this data shows `Name` and `Date`.) – r2evans Jul 28 '20 at 17:48
  • 1
    Also just using `diff()` seems easier that what you might be doing here. It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. Proving a `dput()` allows is to see exactly what format the data is in. – MrFlick Jul 28 '20 at 17:51
  • @r2evans I already converted to date. Also, I changed the names of the factors but rest assured all is working on that front. I'm familiar enough with R to have checked that. – CheckMy Brain Jul 28 '20 at 17:58
  • @MrFlick do you know how I'd set it up based on making sure that only the differences within each person are calculated? – CheckMy Brain Jul 28 '20 at 18:01

2 Answers2

1
 transform(df, diff = ave(Date, Name, FUN = function(x)c(NA,diff(as.Date(x)))))
   Name       Date diff
1 David 2019-12-23 <NA>
2 David  2020-1-10   18
3 David  2020-2-13   34
4 Kevin  2019-2-12 <NA>
5 Kevin  2019-3-19   35
6 Kevin   2019-5-1   43
7 Kevin  2019-7-23   83
Onyambu
  • 67,392
  • 3
  • 24
  • 53
0

Just use lag from the dplyr package:

Description: Find the "previous" (lag()) or "next" (lead()) values in a vector. Useful for comparing values behind of or ahead of the current values.

df %>%
  group_by(name) %>%
  mutate(diff = date - lag(date))

Output:

  name  date       diff   
  <chr> <date>     <drtn> 
1 David 2019-12-23 NA days
2 David 2020-01-10 18 days
3 David 2020-02-13 34 days
4 Kevin 2019-02-12 NA days
5 Kevin 2019-03-19 35 days
6 Kevin 2019-05-01 43 days
7 Kevin 2019-07-23 83 days