I am working on a data set, which has two columns: id, date/time. Please find the example below,
id date_time
1 2016-10-29 18:01:03.0000000 +08:00
1 2016-10-29 19:34:17.0000000 +08:00
1 2016-10-30 14:08:03.0000000 +08:00
1 2016-10-30 15:55:12.0000000 +08:00
2 2016-10-31 11:32:12.0000000 +08:00
2 2016-10-31 14:59:56.0000000 +08:00
2 2016-11-01 12:49:44.0000000 +08:00
2 2016-11-01 13:55:16.0000000 +08:00
2 2016-11-01 19:18:22.0000000 +08:00
2 2016-11-01 20:40:48.0000000 +08:00
3 2016-11-01 21:19:50.0000000 +08:00
3 2016-11-02 14:20:15.0000000 +08:00
3 2016-11-02 18:52:27.0000000 +08:00
3 2016-11-02 19:39:32.0000000 +08:00
3 2016-11-03 08:55:41.0000000 +08:00
All I wanted to obtain is two columns such that: column 1 has cumulative occurrences for each id ordered using date and time and column 2 has cumulative dates for each id as shown in the table below,
id date_time occ date
1 2016-10-29 18:01:03.0000000 +08:00 1 1
1 2016-10-29 19:34:17.0000000 +08:00 2 1
1 2016-10-30 14:08:03.0000000 +08:00 3 2
1 2016-10-30 15:55:12.0000000 +08:00 4 2
2 2016-10-31 11:32:12.0000000 +08:00 1 1
2 2016-10-31 14:59:56.0000000 +08:00 2 1
2 2016-11-01 12:49:44.0000000 +08:00 3 2
2 2016-11-01 13:55:16.0000000 +08:00 4 2
2 2016-11-01 19:18:22.0000000 +08:00 5 2
2 2016-11-01 20:40:48.0000000 +08:00 6 2
3 2016-11-01 21:19:50.0000000 +08:00 1 1
3 2016-11-02 14:20:15.0000000 +08:00 2 2
3 2016-11-02 18:52:27.0000000 +08:00 3 2
3 2016-11-02 19:39:32.0000000 +08:00 4 2
3 2016-11-03 08:55:41.0000000 +08:00 5 3
(Note that +8:00 is there as a redundant). To generate column 1 (
occ
): I have tried usingave
withFUN=seq_along
by first splitting date and time, followed byorder
using id, date and time.Q1: Is there any way I can directly sort the date_time column ?
For column 2(date), I have first taken a subset of the data frame, using unique value I am generating the index using
ave
andseq_along
. After that I am merging the two data set in a loop.Q2: Is there a more efficient method to achieve the same ?