11

I am using data.table 1.10.0.

# install.packages("install.load") # install in order to use the load_package function
install.load::load_package("data.table", "gsubfn", "fpCompare")

# function to convert from fractions and numeric numbers to numeric (decimal)
# Source 1 begins
to_numeric <- function(n) {
    p <- c(if (length(n) == 2) 0, as.numeric(n), 0:1)
    p[1] + p[2] / p[3]
}
# Source 1 ends

Source 1 is Convert a character vector of mixed numbers, fractions, and integers to numeric

max_size_aggr <- 3 / 4

water_nonair <- structure(list(`Slump (in.)` = c("1 to 2", "3 to 4", "6 to 7",
"Approximate amount of entrapped air in nonair- entrained concrete (%)"), `3/8 in.` =
c(350, 385, 410, 3), `1/2 in.` = c(335, 365, 385, 2.5), `3/4 in.` = c(315, 340, 360, 2),
`1 in.` = c(300, 325, 340, 1.5), `1 1/2 in.` = c(275, 300, 315, 1), `2 in.` =
 c(260, 285, 300, 0.5), `3 in.` = c(220, 245, 270, 0.3), `6 in.` = c(190, 210, NA, 0.2)),
 .Names = c("Slump (in.)", "3/8 in.", "1/2 in.",
 "3/4 in.", "1 in.", "1 1/2 in.", "2 in.", "3 in.", "6 in."), row.names = c(NA, -4L),
 class = c("data.table", "data.frame"))

setnames(water_nonair, c("Slump (in.)", "3/8 in.", "1/2 in.", "3/4 in.", "1 in.",
"1 1/2 in.", "2 in.", "3 in.", "6 in."))

water_nonair_col_numeric <- gsub(" in.", "", colnames(water_nonair)[2:ncol(water_nonair)])

water_nonair_col_numeric <- sapply(strapplyc(water_nonair_col_numeric, "\\d+"), to_numeric)
# Source 1

New way (data.table 1.10.0)

water_nonair_column <- which(water_nonair_col_numeric %==% max_size_aggr)+1L
# [1] 4

water_nonair[2, water_nonair_column][[1]]
# [1] 4

Why does the following work when I call out the column index, but the above, also, with a value of 4 does not work?

water_nonair[2, 4][[1]]
# [1] 340

Old way (data.table 1.9.6)

water_nonair[2, which(water_nonair_col_numeric %==% max_size_aggr)+1L, with = FALSE][[1]]
# [1] 340

I removed the with = FALSE from the function after reading the data.table news after the release of version 1.9.8.

iembry
  • 962
  • 1
  • 7
  • 23

1 Answers1

22

The long note 3 in v1.9.8 NEWS starts :

When j contains no unquoted variable names (whether column names or not), with= is now automatically set to FALSE. Thus ...

But your j does contain an unquoted variable name. In fact, it is solely an unquoted variable name. So that item does not apply to it.

That's what the options(datatable.WhenJisSymbolThenCallingScope=TRUE) was about so you could try out the new feature going forward. Please read that same NEWS item about that again. If you set that option, it will work as you expected it to.

HOWEVER please don't. Because yesterday I changed it and in development that option has now gone. A migration timeline is no longer needed. The new strategy needs no code changes and has no breakage. Please see the new notes in the latest development NEWS for v1.10.1. I won't copy them here to save duplication.

So going forward, when j is a symbol (i.e. an unquoted variable name) you either still need with=FALSE :

water_nonair[2, water_nonair_column, with=FALSE]

or you can use the new .. prefix from v1.10.1 added yesterday :

water_nonair[2, ..water_nonair_column]

Otherwise, if j is a symbol it must be a column name for safety, consistency and backwards compatibility. If not, you'll now get the new more helpful error message :

DT = data.table(a=1:3, b=4:6)
myCols = "b"
DT[,myCols]
Error in `[.data.table`(DT, , myCols) : 
  j (the 2nd argument inside [...]) is a single symbol but column name 
  'myCols' is not found. Perhaps you intended DT[,..myCols] or
  DT[,myCols,with=FALSE]. This difference to data.frame is deliberate 
  and explained in FAQ 1.1.

As mentioned in NEWS, I reran all 313 CRAN and Bioconductor packages that use data.table against data.table v1.10.1 and 2 of them do break with this change. But that is what we want because they do have a bug (the value of j in calling scope is being returned literally which cannot be what was intended). I've informed their maintainers. This is exactly what we wanted to reveal and improve. The other 311 packages all pass with this change. It doesn't rely on test coverage (which is weak for many packages). The new error happens when j is a symbol that isn't a column, whether there's a test for the result or not.

Matt Dowle
  • 58,872
  • 22
  • 166
  • 224
  • 1
    Thank you for the detailed explanation and for pushing the new changes through to GitHub. Are you going to do a CRAN release soon with the new changes? – iembry Dec 09 '16 at 03:50
  • Probably in a month or so once people have had a chance to test in dev and make sure ok. Windows compiled binary for latest passing v1.10.1 is available now via one command on Installation wiki page. – Matt Dowle Dec 09 '16 at 06:37
  • This change made it to CRAN on 31 Jan 2017 (v1.10.2) – Matt Dowle May 07 '17 at 21:26
  • 1
    Thank you for the great news on the changes. – iembry May 16 '17 at 03:33