R dcast and dates

Question

I have a simple data frame

    SubjectID ImgTP   ScanDate
1      01-001    B0 2019-06-28
2      01-001    PO 2019-06-30
3      01-001   W02 2019-07-09
4      01-001   W06 2019-08-15
5      01-001   W12 2019-09-13
6      01-001   W24 2019-12-13
7      01-001 INT01 2019-07-25
8      01-001 INT02 2019-10-25
9      01-002    B0 2019-07-25
10     01-002    PO 2019-07-26
11     01-002   W02 2019-08-15
12     01-002   W06 2019-09-05
13     01-002   W12 2019-10-24
14     01-002   W24 2020-01-23

I want to create a table the has SubjectID vs TP and displays date in the cell.

basic_table <- dcast(myDataArm,  SubjectID ~ ImgTP, value.var = "ScanDate",
                      fun.aggregate = function(x) lubridate::as_date(x), fill = 0)

However the dates in the table are reported as numbers. Why is it not displaying correctly?

i'm assuming you are using the `reshape2` package, it seems like a [known issue](https://stackoverflow.com/questions/12289731/posixct-values-become-numeric-in-reshape2-dcast). I don't think this package is in development anymore, and so you might try whatever flavor of the month hadley uses in the `tidyverse`. interestingly, if you convert to a `data.table`: `x <- data.table::as.data.table(myDataArm)`, then use `data.table::dcast(x, ...)`, you will get the right result. then you can `as.data.frame()` back if you want — rawr, Mar 20 '20 at 23:40
thanks, I understand the mistake now. It does not work as I needed, works best with reshape — Marina, Mar 23 '20 at 20:58

G. Grothendieck · Answer 1 · 2020-03-21T16:13:53.303

You could just use reshape in base R. It creates names of the form ScanDate.B0 so we remove everything prior to and including the dot in the second line to get nicer names.

r <- reshape(myDataArm, dir = "wide", idvar = "SubjectID", timevar = "ImgTP")
names(r) <- sub(".*\\.", "", names(r))
r
##   SubjectID         B0         PO        W02        W06        W12        W24      INT01      INT02
## 1    01-001 2019-06-28 2019-06-30 2019-07-09 2019-08-15 2019-09-13 2019-12-13 2019-07-25 2019-10-25
## 9    01-002 2019-07-25 2019-07-26 2019-08-15 2019-09-05 2019-10-24 2020-01-23       <NA>       <NA>

Note

The input in reproducible form is:

Lines <- "    SubjectID ImgTP   ScanDate
1      01-001    B0 2019-06-28
2      01-001    PO 2019-06-30
3      01-001   W02 2019-07-09
4      01-001   W06 2019-08-15
5      01-001   W12 2019-09-13
6      01-001   W24 2019-12-13
7      01-001 INT01 2019-07-25
8      01-001 INT02 2019-10-25
9      01-002    B0 2019-07-25
10     01-002    PO 2019-07-26
11     01-002   W02 2019-08-15
12     01-002   W06 2019-09-05
13     01-002   W12 2019-10-24
14     01-002   W24 2020-01-23"
myDataArm <- read.table(text = Lines)
myDataArm$ScanDate <- as.Date(myDataArm$ScanDate)

Edward · Answer 2 · 2020-03-21T00:16:20.600

The pivot_wider function from tidyr is one option.

library(tidyr)

  pivot_wider(myDataArm, names_from=ImgTP, values_from=ScanDate)

# A tibble: 2 x 9
  SubjectID B0         PO         W02        W06        W12        W24        INT01      INT02     
  <chr>     <date>     <date>     <date>     <date>     <date>     <date>     <date>     <date>    
1 01-001    2019-06-28 2019-06-30 2019-07-09 2019-08-15 2019-09-13 2019-12-13 2019-07-25 2019-10-25
2 01-002    2019-07-25 2019-07-26 2019-08-15 2019-09-05 2019-10-24 2020-01-23 NA         NA

You could also use the reshape function, but "ScanDate" appears in the wide variable names.

reshape(myDataArm, direction="wide", idvar = "SubjectID", 
                   v.names="ScanDate", timevar="ImgTP")


  SubjectID ScanDate.B0 ScanDate.PO ScanDate.W02 ScanDate.W06 ScanDate.W12 ScanDate.W24 ...
1    01-001  2019-06-28  2019-06-30   2019-07-09   2019-08-15   2019-09-13   2019-12-13
9    01-002  2019-07-25  2019-07-26   2019-08-15   2019-09-05   2019-10-24   2020-01-23

not to be confused with `tidyr::spread(myDataArm, ImgTP, ScanDate)` !! — rawr, Mar 21 '20 at 00:20

R dcast and dates

2 Answers2

Note