Recently I asked a different question here at stackoverflow. It can be found here 1: However, while the solutions seemed to be very concise, the results showed a huge error we couldn't solve. Long story short: I have a data frame in the following form:
df <- structure(list(X = c("A", "A", "B", "C", "C"), Y = c(1L, 2L,
3L, 1L, 3L)), .Names = c("X", "Y"), class = "data.frame", row.names = c(NA,
-5L))
I want a list like this:
$`A`
[1] 1 2
$`B`
[1] 3
$`C`
[1] 1 3
@akrun suggested using data.table since my data has 22 million rows. Accordingly, I used the following code.
library(data.table)
DT <- as.data.table(df)
DT1 <- DT[, list(Y=list(Y)), by=X]
DT1$Y
However, my Y is a factor. And while the code works for an integer, it doesn't work for a factor. I get the following result, with the example data set and with the 22 million rows and with a sub sample of 200 rows.
DT1$Y
#[[1]]
#[1] 1 3
#[[2]]
#[1] 1 3
#[[3]]
#[1] 1 3
Does anyone know why? I am using R 3.1.1 and data.table 1.9.2 edited for clarity