I am stuck with a question that I can best illustrate through an example dataset:
> head(posts, 11)
userId postId post.time post.freq
1 JON 100000000000 2016-12-06 1
2 JOSH 100000000001 2017-04-29 1
3 JIMBO 100000000002 2018-08-24 1
4 JAMIE 100000000003 2017-01-29 1
5 JANETTE 100000000004 2018-01-17 1
6 BEN 100000000005 2017-05-03 6
7 BEN 100000000006 2017-01-21 6
8 BEN 100000000007 2017-01-24 6
9 BEN 100000000008 2017-01-23 6
10 BEN 100000000009 2017-01-22 6
11 BEN 100000000010 2018-07-03 6
In this dataset, I want to calculate the time difference between two posts for the same userId. For example, user "BEN" has 6 posts (as post.freq indicates), his first post was created at 2017-01-21, while his second one was created at 2017-01-22; the time difference between these two is 1 day. For "BEN" the time difference between his second and third post would be 1 day again (2017-01-22 and 2017-01-23) and so on; while the time difference of his zeroth and first post is NA.
The result should be something like this:
> head(posts, 11)
userId postId post.time post.freq post.timediff
1 JON 100000000000 2016-12-06 1 NA
2 JOSH 100000000001 2017-04-29 1 NA
3 JIMBO 100000000002 2018-08-24 1 NA
4 JAMIE 100000000003 2017-01-29 1 NA
5 JANETTE 100000000004 2018-01-17 1 NA
6 BEN 100000000005 2017-05-03 6 99
7 BEN 100000000006 2017-01-21 6 NA
8 BEN 100000000007 2017-01-24 6 1
9 BEN 100000000008 2017-01-23 6 1
10 BEN 100000000009 2017-01-22 6 1
11 BEN 100000000010 2018-07-03 6 304
I couldn't figure out how to do this. Could anyone help me?
Thanks in advance!