The id column identifies a unique observation and the comments column describes the observation. The time column is important to distinguish the first comment from the last one. I want to create a column that hols all information known about a unique observation; the row should contain all previous + current comments on the observation (id). How could I achieve this? I tried dplyer and lag, although ordering on time yet does not happen properly.
#--- library
library(dplyr)
library(data.table)
id <- c(1, 1, 1, 2, 2, 56, 56, 56, 107, 1005, 1005, 7, 7, 7, NA)
time <- c("2017-03-25 12:58:37 GMT", "2017-03-24 05:50:22 GMT", "2017-03-23 19:10:01 GMT", "2017-03-24 13:41:18 GMT", "2017-03-26 05:49:37 GMT", "2017-03-23 16:48:04 GMT", "2017-03-23 18:38:19 GMT",
"2017-03-23 14:50:47 GMT", "2017-03-24 02:02:53 GMT", "2017-03-24 03:10:04 GMT", "2017-03-24 21:01:02 GMT", "2017-03-23 16:16:21 GMT", "2017-03-23 20:42:46 GMT", "2017-03-24 09:03:29 GMT",
"2017-03-23 15:46:00 GMT")
comments <- c("lajsd", 'asdf', 'qwee', 'xcx', 'serf', 'sdfe', 'hyhds', 'wafd', 'aerfd', 'sefr', 'qdfsfe', 'qwewd', 'wqdse', 'qwddr', 'qdwq')
mytable <- data.table(id, time, comments)
View(mytable)
mytable$time <- as.POSIXct(mytable$time, format = "%Y-%m-%d %H:%M:%S")
mytable %>%
group_by(id, time) %>%
mutate(record = lag(comments))
Ideal solution:
id time comments record
<dbl> <dttm> <chr> <chr>
1 1 2017-03-25 12:58:37 lajsd qwee, asdf, lajsd
2 1 2017-03-24 05:50:22 asdf qwee, asdf
3 1 2017-03-23 19:10:01 qwee qwee
4 2 2017-03-24 13:41:18 xcx xcx
5 2 2017-03-26 05:49:37 serf xcx, serf
6 56 2017-03-23 16:48:04 sdfe wafd, sdfe
7 56 2017-03-23 18:38:19 hyhds wafd, sdfe, hyhds
8 56 2017-03-23 14:50:47 wafd wafd
9 107 2017-03-24 02:02:53 aerfd aerfd
10 1005 2017-03-24 03:10:04 sefr sefr
11 1005 2017-03-24 21:01:02 qdfsfe sefr, qdfsfe
12 7 2017-03-23 16:16:21 qwewd qwewd
13 7 2017-03-23 20:42:46 wqdse qwewd, wqdse
14 7 2017-03-24 09:03:29 qwddr qwewd, wqdse, qwddr
15 NA 2017-03-23 15:46:00 qdwq qdwq
So I tried
setDT(mytable)[, record := sapply(seq_len(.N), function(x) paste(comments[seq_len(x)], collapse = " ")), by = list(id)]
id time comments date hour record
1: 1 2017-03-25 12:58:37 lajsd 2017-03-25 12:58:37 lajsd
2: 1 2017-03-24 05:50:22 asdf 2017-03-24 05:50:22 lajsd asdf
3: 1 2017-03-23 19:10:01 qwee 2017-03-23 19:10:01 lajsd asdf qwee
4: 2 2017-03-24 13:41:18 xcx 2017-03-24 13:41:18 xcx
5: 2 2017-03-26 05:49:37 serf 2017-03-26 05:49:37 xcx serf
6: 56 2017-03-23 16:48:04 sdfe 2017-03-23 16:48:04 sdfe
7: 56 2017-03-23 18:38:19 hyhds 2017-03-23 18:38:19 sdfe hyhds
8: 56 2017-03-23 14:50:47 wafd 2017-03-23 14:50:47 sdfe hyhds wafd
9: 107 2017-03-24 02:02:53 aerfd 2017-03-24 02:02:53 aerfd
10: 1005 2017-03-24 03:10:04 sefr 2017-03-24 03:10:04 sefr
11: 1005 2017-03-24 21:01:02 qdfsfe 2017-03-24 21:01:02 sefr qdfsfe
12: 7 2017-03-23 16:16:21 qwewd 2017-03-23 16:16:21 qwewd
13: 7 2017-03-23 20:42:46 wqdse 2017-03-23 20:42:46 qwewd wqdse
14: 7 2017-03-24 09:03:29 qwddr 2017-03-24 09:03:29 qwewd wqdse qwddr
15: NA 2017-03-23 15:46:00 qdwq 2017-03-23 15:46:00 qdwq
Which yields almost a satisfactory result. The only problem is that using time is not possible.
mytable$date <- as.IDate(mytable$time)
mytable$hour <- as.ITime(mytable$time)
setDT(mytable)[, record := sapply(seq_len(.N), function(x) paste(comments[seq_len(x)], collapse = " ")), by = list(id, date, hour)]
Any thoughts on inclusion of this? can't get it right..
This works
mytable$time <- as.POSIXct(mytable$time, format = "%Y-%m-%d %H:%M:%S", tz = 'GMT')
mytable <- arrange(mytable, id, time)
setDT(mytable)[, record := sapply(seq_len(.N), function(x) paste(comments[seq_len(x)], collapse = " ")), by = list(id)]