I am trying to subset a data table by the numeric values so I can perform a five number summary on the numeric variables only. However, I also need to group the variables. The way that I was trying to do it does not allow me to use the subset and the id variable that is not part of the subset. I know that data table has the .SD
command, but I cannot seem to get the right combination of apply functions and group in data table. The id
variable is not numeric and cannot be coerced into being numeric; it is also not unique in my data table.
Here is what I have tried:
library(data.table)
library(magrittr)
dt <- data.table(num1 = rep(1, 10),
num2 = rep(2, 10),
num3 = rep(100, 10),
id = c("1a", "2b", "2h", "3b", "4b", "5b", "5b", "7n", "8mn", "9y"),
char1 = rep("a", 10),
char2 = rep("b", 10))
numeric_variables <-
lapply(dt, is.numeric) %>%
unlist() %>%
as.vector()
dt[, numeric_variables, with = FALSE]
dt_summary <-
apply(dt[, numeric_variables, with = FALSE][, grep("num",
names(dt[, numeric_variables, with = FALSE]),
value = TRUE),
with = FALSE],
2,
fivenum) %>%
as.data.frame()
rownames(dt_summary) <-
c("Min", "Q1", "Med", "Q3", "Max")
dt_summary
dt[, .(numeric_variables, id), with = FALSE]
The final line does not work because id
is not in the numeric_variables category I created. If someone could direct me to using the correct by
, tapply
function with .SD
I would appreciate it.
NOTE: This is part of a larger program where the user can either select one id
to look at or compare two id
variables at once. So, it needs to work for either one or many groups (eventually).