I think the fundamental answer to your question is that Hadley Wickham, when writing tibble 1.0, wanted consistent behavior of the [
operator. This decision is discussed, somewhat indirectly, in Wickham's Advanced R in the chapter on Subsetting:
It’s important to understand the distinction between simplifying and
preserving subsetting. Simplifying subsets returns the simplest
possible data structure that can represent the output, and is useful
interactively because it usually gives you what you want. Preserving
subsetting keeps the structure of the output the same as the input,
and is generally better for programming because the result will always
be the same type. Omitting drop = FALSE when subsetting matrices and
data frames is one of the most common sources of programming errors.
(It will work for your test cases, but then someone will pass in a
single column data frame and it will fail in an unexpected and unclear
way.)
Here, we can clearly see that Hadley is concerned with the inconsistent default behavior of [.data.frame
, and why he would choose to change the behavior in tibble.
With the above terminology in mind, it's easy to see that whether the [.data.frame
operator produces a simplifying subset or a preserving subset by default is dependent on the input rather than the programming. e.g., take a data frame data_df
and subset it:
data_df <- data.frame(a = runif(10), b = letters[1:10])
data_df[, 2]
data_df[, 1:2]
You get a vector in one case and a data frame in the other. To predict the type of output, you have to either know in advance how many columns are going to be subsetted (i.e. you have to know length(list_of_columns)
), which may come from user input, or you need to explicitly add the drop =
parameter. So the following produces the same class of object, but the added parameter is unnecessary in the second case (and may be unknown to the majority of R users):
data_df[, 2, drop = FALSE]
data_df[, 1:2, drop = FALSE]
With tibble (or dplyr), we have consistent behavior by default, so we can be assured of having the same class of object when subsetting with the [
operator no matter how many columns we return:
library(tibble)
data_df <- tibble(a = runif(10), b = letters[1:10])
data_df[, 2]
data_df[, 1:2]