0

The id column identifies a unique observation and the comments column describes the observation. The time column is important to distinguish the first comment from the last one. I want to create a column that hols all information known about a unique observation; the row should contain all previous + current comments on the observation (id). How could I achieve this? I tried dplyer and lag, although ordering on time yet does not happen properly.

#--- library
library(dplyr)
library(data.table)

id <- c(1, 1, 1, 2, 2, 56, 56, 56, 107, 1005, 1005, 7, 7, 7, NA)
time <- c("2017-03-25 12:58:37 GMT", "2017-03-24 05:50:22 GMT", "2017-03-23 19:10:01 GMT", "2017-03-24 13:41:18 GMT", "2017-03-26 05:49:37 GMT", "2017-03-23 16:48:04 GMT", "2017-03-23 18:38:19 GMT",
       "2017-03-23 14:50:47 GMT", "2017-03-24 02:02:53 GMT", "2017-03-24 03:10:04 GMT", "2017-03-24 21:01:02 GMT", "2017-03-23 16:16:21 GMT", "2017-03-23 20:42:46 GMT", "2017-03-24 09:03:29 GMT",
       "2017-03-23 15:46:00 GMT")
comments <- c("lajsd", 'asdf', 'qwee', 'xcx', 'serf', 'sdfe', 'hyhds', 'wafd', 'aerfd', 'sefr', 'qdfsfe', 'qwewd', 'wqdse', 'qwddr', 'qdwq')

mytable <- data.table(id, time, comments)

View(mytable)

mytable$time <- as.POSIXct(mytable$time, format = "%Y-%m-%d %H:%M:%S")


mytable %>%
  group_by(id, time) %>%
  mutate(record = lag(comments))

Ideal solution:

    id       time            comments   record
   <dbl>              <dttm>    <chr>  <chr>
 1     1 2017-03-25 12:58:37    lajsd   qwee, asdf, lajsd
 2     1 2017-03-24 05:50:22     asdf   qwee, asdf
 3     1 2017-03-23 19:10:01     qwee   qwee 
 4     2 2017-03-24 13:41:18      xcx   xcx
 5     2 2017-03-26 05:49:37     serf   xcx, serf
 6    56 2017-03-23 16:48:04     sdfe   wafd, sdfe
 7    56 2017-03-23 18:38:19    hyhds   wafd, sdfe, hyhds
 8    56 2017-03-23 14:50:47     wafd   wafd
 9   107 2017-03-24 02:02:53    aerfd   aerfd
10  1005 2017-03-24 03:10:04     sefr   sefr
11  1005 2017-03-24 21:01:02   qdfsfe   sefr, qdfsfe
12     7 2017-03-23 16:16:21    qwewd   qwewd
13     7 2017-03-23 20:42:46    wqdse   qwewd, wqdse
14     7 2017-03-24 09:03:29    qwddr   qwewd, wqdse, qwddr
15    NA 2017-03-23 15:46:00     qdwq   qdwq

So I tried

setDT(mytable)[, record := sapply(seq_len(.N), function(x) paste(comments[seq_len(x)], collapse = " ")), by = list(id)]



id                time comments       date     hour            record
 1:    1 2017-03-25 12:58:37    lajsd 2017-03-25 12:58:37             lajsd
 2:    1 2017-03-24 05:50:22     asdf 2017-03-24 05:50:22        lajsd asdf
 3:    1 2017-03-23 19:10:01     qwee 2017-03-23 19:10:01   lajsd asdf qwee
 4:    2 2017-03-24 13:41:18      xcx 2017-03-24 13:41:18               xcx
 5:    2 2017-03-26 05:49:37     serf 2017-03-26 05:49:37          xcx serf
 6:   56 2017-03-23 16:48:04     sdfe 2017-03-23 16:48:04              sdfe
 7:   56 2017-03-23 18:38:19    hyhds 2017-03-23 18:38:19        sdfe hyhds
 8:   56 2017-03-23 14:50:47     wafd 2017-03-23 14:50:47   sdfe hyhds wafd
 9:  107 2017-03-24 02:02:53    aerfd 2017-03-24 02:02:53             aerfd
10: 1005 2017-03-24 03:10:04     sefr 2017-03-24 03:10:04              sefr
11: 1005 2017-03-24 21:01:02   qdfsfe 2017-03-24 21:01:02       sefr qdfsfe
12:    7 2017-03-23 16:16:21    qwewd 2017-03-23 16:16:21             qwewd
13:    7 2017-03-23 20:42:46    wqdse 2017-03-23 20:42:46       qwewd wqdse
14:    7 2017-03-24 09:03:29    qwddr 2017-03-24 09:03:29 qwewd wqdse qwddr
15:   NA 2017-03-23 15:46:00     qdwq 2017-03-23 15:46:00              qdwq

Which yields almost a satisfactory result. The only problem is that using time is not possible.

mytable$date <- as.IDate(mytable$time)
mytable$hour <- as.ITime(mytable$time)
setDT(mytable)[, record := sapply(seq_len(.N), function(x) paste(comments[seq_len(x)], collapse = " ")), by = list(id, date, hour)]

Any thoughts on inclusion of this? can't get it right..

This works

mytable$time <- as.POSIXct(mytable$time, format = "%Y-%m-%d %H:%M:%S", tz = 'GMT')
    mytable <- arrange(mytable, id, time)
    setDT(mytable)[, record := sapply(seq_len(.N), function(x) paste(comments[seq_len(x)], collapse = " ")), by = list(id)]
cappuccino
  • 325
  • 3
  • 13
  • [Cumulatively paste (concatenate) values grouped by another variable](https://stackoverflow.com/questions/24862046/cumulatively-paste-concatenate-values-grouped-by-another-variable) – Henrik May 29 '17 at 09:19
  • 1
    How would I deal with the time variable? setDT(mytable)[, content2 := sapply(seq_len(.N), function(x) paste(comments[seq_len(x)], collapse = " ")), by = c('id')] This works, but not: setDT(mytable)[, content2 := sapply(seq_len(.N), function(x) paste(comments[seq_len(x)], collapse = " ")), by = c('id', 'time')] – cappuccino May 29 '17 at 09:23

0 Answers0