0

For a package I am currently writing I have to subset rows of a dataframe, which can have 1 or more columns. After hours of debugging I found out that R deals differently when a data.frame has one column as opposed to more:

df1 <- data.frame(col1 = c(1, 2, 3), col2 = c(2, 3, 4))
df2 <- data.frame(col1 = c(1, 2, 3))
class(df1[1, ])
#> [1] "data.frame"
class(df2[1, ])
#> [1] "numeric"

This is so annoying and I would have to implement if-statements to take care of this, which I don't want to. Can someone tell my why that is, and how I turn it off?

  • 2
    Just include the drop=FALSE arguement: `df2[1,,drop=FALSE]`. – lmo Aug 24 '17 at 16:55
  • AH. THANK YOU. why the hell is this even an option? – Stanislaus Stadlmann Aug 24 '17 at 17:01
  • 1
    In general, R simplifies objects as much as it can by default. Maybe due to back in the day when memory was a big issue, but not sure why. This can be handy at times when you expect it, since simpler objects are usually easier to work with. However, in situations such as yours, this can lead to unintended outcomes. – lmo Aug 24 '17 at 17:06
  • R functions tend to simplify things when possible to remove layers of complexity to give you the simplest objects to work with, which in most cases makes things easier...just not this one! – sconfluentus Aug 24 '17 at 17:06
  • @Stan125. You may consider working with `data.table`s instead. See e.g. [here](https://cran.rproject.org/web/packages/data.table/vignettes/datatable-faq.html#SmallerDiffs): "In `[.data.frame` we very often set `drop = FALSE`. When we forget, bugs can arise in edge cases where single columns are selected and all of a sudden a vector is returned rather than a single column `data.frame`. In `[.data.table` we took the opportunity to make it consistent and dropped `drop`." – Henrik Aug 24 '17 at 17:32
  • @Henrik Thanks for the suggestion, but when I write a new package I want to depend on as few packages as possible. – Stanislaus Stadlmann Aug 24 '17 at 17:34

0 Answers0