I have been able to create a table with the median
mean
and sd
, based on several conditions, using the dplry
package. To do so I prepped the data. It looks a bit sloppy to me and I am using a big dataset (>3gb) so I have been using data.table
already.
I am wondering how I can create a similar table more efficiently using the data.table
package, so, if it is possible to do this without prepping the data too much. That would be great as I need to do this for many other conditions as well. Btw, I checked out this post but it didn't add conditions.
This is what I have and need:
I need the mean
of con
by id
and date
= 1 year since first i
or i2
= A04
or A01
.
EDIT (because it wasn't running before):
DATA:
DT <- structure(list(id = c(123L, 123L, 332L, 332L, 332L, 100L, 100L, 113L, 113L, 113L, 113L, 551L, 551L),
i = c("D95", "F85", "A01", "A04", "K20", "B10", "A04", "T08", "P28", "D95", "A04", "B12", "D95"),
i2 = c("F15", "", "", "", "", "", "", "", "", "A01", "", "A01", ""),
date = c("2015-06-19", "2016-08-15", "2013-03-16", "2017-01-17", "2013-01-16", "2009-05-08", "2011-04-03", "2015-05-04",
"2011-04-04", "2017-08-04", "2011-05-24", "2013-11-04", "2013-05-04"),
con = c("1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1"),
PS = c("0", "0", "1", "0", "0", "0", "1", "0", "0", "0", "1", "1", "0")),
class = "data.frame", row.names = c(NA, -13L))
The following is what I have done so far (and which works):
I created the columns PS
(= having either A01
or A04
in i
or i2
) and ds
(=days since first A01
or A01
(days aren't correct here)) based on the data.
With the dplyr
package I first made a data.table
with only rows with only ds
between -365&0.
j.ds <- subset(DT, ds >= -365 & ds < 0)
Than agregated to a table with sum
of con
per id
, like so:
j <- j.ds %>%
group_by(id) %>%
summarise(con = sum(con))
From there I made the desired table:
jP <- j %>%
summarise(median = median(con), mean = mean(con), SD = sd(con))