This one’s quite weird. Not sure if I’m missing something, or whether it’s a bug in data.table
or fread
.
I’m trying to “stretch” a data table with a time series where one time point is missing. When this table is read from a file, the X[Y] join fills NA’s in the missing row but also in other rows where data points are present. This happens only when the t
column used for keying contains floats as opposed to integers.
library(data.table)
# This works fine; empty row at t=0.5
# is filled with NA after join
dt = data.table(id = as.integer(rep(0, 10)),
t = seq(0.1, 1, 0.1),
y = 1:10,
key = "id,t")
dt = dt[!(t == 0.5)]
dtAux = dt[,
.(seq(min(t), max(t), 0.1)),
by = id]
setkey(dtAux, id, V1)
dt[dtAux]
id t y
1: 0 0.1 1
2: 0 0.2 2
3: 0 0.3 3
4: 0 0.4 4
5: 0 0.5 NA
6: 0 0.6 6
7: 0 0.7 7
8: 0 0.8 8
9: 0 0.9 9
10: 0 1.0 10
# This fails; NA’s created in multiple rows
fwrite(dt, "test.csv", row.names = F)
dtFromFile = fread("test.csv")
setkey(dtFromFile, id, t)
dtAux = dtFromFile[,
.(seq(min(t), max(t), 0.1)),
by = id]
setkey(dtAux, id, V1)
dtFromFile[dtAux]
id t y
1: 0 0.1 1
2: 0 0.2 2
3: 0 0.3 NA
4: 0 0.4 4
5: 0 0.5 NA
6: 0 0.6 6
7: 0 0.7 NA
8: 0 0.8 8
9: 0 0.9 9
10: 0 1.0 10
Tested on 3.6.1 with data.table
1.12.4
> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux bullseye/sid
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.8.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.8.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.12.4
loaded via a namespace (and not attached):
[1] compiler_3.6.1 tools_3.6.1