0

If I want to list all rows of a column in a dataset in R, I am able to do it in these two ways:

> dataset[,'column'] 
> dataset$column

It appears that both give me the same result. What is the difference?

user3422637
  • 3,967
  • 17
  • 49
  • 72
  • Take a look [here](http://stackoverflow.com/questions/18222286/select-a-data-frame-column-using-and-the-name-of-the-column-in-a-variable) – David Arenburg Oct 12 '14 at 23:27

2 Answers2

4

In practice, not much, as long as dataset is a data frame. The main difference is that the dataset[, "column"] formulation accepts variable arguments, like j <- "column"; dataset[, j] while dataset$j would instead return the column named j, which is not what you want.

dataset$column is list syntax and dataset[ , "column"] is matrix syntax. Data frames are really lists, where each list element is a column and every element has the same length. This is why length(dataset) returns the number of columns. Because they are "rectangular," we are able to treat them like matrices, and R kindly allows us to use matrix syntax on data frames.

Note that, for lists, list$item and list[["item"]] are almost synonymous. Again, the biggest difference is that the latter form evaluates its argument, whereas the former does not. This is true even in the form `$`(list, item), which is exactly equivalent to list$item. In Hadley Wickham's terminology, $ uses "non-standard evaluation."

Also, as mentioned in the comments, $ always uses partial name matching, [[ does not by default (but has the option to use partial matching), and [ does not allow it at all.

I recently answered a similar question with some additional details that might interest you.

Community
  • 1
  • 1
shadowtalker
  • 12,529
  • 3
  • 53
  • 96
  • 1
    It's not the only difference. Don't forget that the `$` method allows for partial name matching. So, one could (but shouldn't) do `dataset$col` and get the values for `dataset$column`. That can cause issues for the unwary. – hrbrmstr Oct 12 '14 at 23:47
  • 1
    Actually I _did_ forget, what with tab completion and a healthy mistrust of partial matching. – shadowtalker Oct 13 '14 at 00:24
  • 1
    `[[` supports partial matching, e.g.: `dat <- data.frame(variable1=1:3); dat[["v",exact=FALSE]]` The help file at `?Extract` even goes through this: _"‘x$name’ is equivalent to ‘x[["name", exact = FALSE]]’_ – thelatemail Oct 13 '14 at 01:19
  • @thelatemail more things I didn't know. Edited – shadowtalker Oct 13 '14 at 01:26
  • Ohhhh, that's what `exact` does in a list index?!?! I never really took time to learn that but knew it was there. Woohoo for thelatemail! – Rich Scriven Oct 13 '14 at 01:29
  • So I just went back and read through the docs. Turns out that `[[` also has an argument `j`, so you can use matrix syntax with it as well. Now _I'm_ confused as to what the difference is. – shadowtalker Oct 13 '14 at 01:44
  • @ssdecontrol - it's mainly that `[` allows multiple selections. – thelatemail Oct 13 '14 at 01:51
  • Just my opinion but I stopped using data$var as the partial name matching can lead to badness in coding and unexpected results. – Tyler Rinker Oct 13 '14 at 01:54
  • @thelatemail of course. That's enough thinking for today. – shadowtalker Oct 13 '14 at 02:00
0

Use 'str' command to see the difference:

> mydf
  user_id Gender Age
1       1      F  13
2       2      M  17
3       3      F  13
4       4      F  12
5       5      F  14
6       6      M  16
> 
> str(mydf)
'data.frame':   6 obs. of  3 variables:
 $ user_id: int  1 2 3 4 5 6
 $ Gender : Factor w/ 2 levels "F","M": 1 2 1 1 1 2
 $ Age    : int  13 17 13 12 14 16
> 
> str(mydf[1])
'data.frame':   6 obs. of  1 variable:
 $ user_id: int  1 2 3 4 5 6
> 
> str(mydf[,1])
 int [1:6] 1 2 3 4 5 6
> 
> str(mydf[,'user_id'])
 int [1:6] 1 2 3 4 5 6

> str(mydf$user_id)
 int [1:6] 1 2 3 4 5 6
> 
> str(mydf[[1]])
 int [1:6] 1 2 3 4 5 6
> 
> str(mydf[['user_id']])
 int [1:6] 1 2 3 4 5 6

mydf[1] is a data frame while mydf[,1] , mydf[,'user_id'], mydf$user_id, mydf[[1]], mydf[['user_id']] are vectors.

rnso
  • 23,686
  • 25
  • 112
  • 234
  • You can use `drop=FALSE` with a couple of those if you want to get a `data.frame` back. – GSee Oct 13 '14 at 01:37