(This was posted previously at the data-table-help mailing list, but it's been a few weeks without comment, and I did a little more to try to debug it.)
I ran into a strange error that an internet search only turns up in the commit log of data.table
:
# Error in dcast.data.table(test.table, as.formula(paste(class.col, "+", :
# retFirst must be integer vector the same length as nrow(i)
This came up on running a previously tested working dcast.data.table expression, on a data.table I have subsetted by randomly resampling Trial
with replacement. The offending section is this:
dcast.data.table(test.table,
Class + Time + Trial ~ Channel,
value.var = "Voltage",
fun.aggregate=identity)
It seems to be choking on near-duplicate rows in the input table (i.e., the error is the same with or without the id
column present in the table):
test.table <- structure(list(Trial = c(1169L, 1169L), Sample = c(155L, 155L
), Class = c(1L, 1L), Subject = structure(c(13L, 13L), .Label = c("s01",
"s02", "s03", "s04", "s05", "s06", "s07", "s08", "s09", "s10",
"s11", "s12", "s13"), class = "factor"), Channel = c(1L, 1L),
Voltage = structure(c(-0.992322316444497, -0.992322316444497
), "`scaled:center`" = -6.23438399446429e-16, "`scaled:scale`" = 1),
Time = c(201.149466192171, 201.149466192171), Baseline = c(0.688151312347969,
0.688151312347969), id = 1:2), .Names = c("Trial", "Sample",
"Class", "Subject", "Channel", "Voltage", "Time", "Baseline",
"id"), class = c("data.table", "data.frame"), row.names = c(NA,
-2L), sorted = "id")
test.table
# Trial Sample Class Subject Channel Voltage Time Baseline id
# 1: 1169 155 1 s13 1 -0.9923223 201.1495 0.6881513 1
# 2: 1169 155 1 s13 1 -0.9923223 201.1495 0.6881513 2
dcast.data.table(test.table,
Class + Time + Trial ~ Channel,
value.var = "Voltage",
fun.aggregate=identity)
# Error in dcast.data.table(test.table, Class + Time + Trial ~ Channel, :
# retFirst must be integer vector the same length as nrow(i)
Changing a single column in the dcast
formula gets close to the output I am looking for:
test.table[2,Trial:=1170]
dcast.data.table(test.table,
Class + Time + Trial ~ Channel,
value.var = "Voltage",
fun.aggregate=identity)
# Class Time Trial 1
# 1: 1 201.1495 1169 -0.9923223
# 2: 1 201.1495 1170 -0.9923223
What's bothering data.table? I tried changing keys and messing with the order of the formula terms just to see, because I don't understand the error, but that didn't work.
If I replace the function call with regular dcast
from reshape2
, I get a seemingly unrelated error:
# Error in vapply(indices, fun, .default) : values must be length 0, but FUN(X[[29]]) result is length 1
At this point in my code I don't care if the Trial
values are correct, so I could work around this by replacing it in the formula with id
, but I'm interested in a more general or robust solution.