7

I am trying to subset a data frame by using a variable name. I have it working but there is a part which I don't quite understand.

Originally I have this: rownames (mtcars[mtcars$hp >150,]).

Then, rather than hard-coding "hp", I wanted to assign "hp" to a variable: foo <- "hp" and subset with that. I got it working using this: rownames (mtcars[mtcars[foo] >150,]). (Thanks to link which stopped me from playing with the $ operator.)

But, as I was building up this statement, I noticed there was a difference between the two. For mtcars$hp > 150, I get this output:

 [1] FALSE FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE FALSE FALSE FALSE  TRUE
[13]  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
[25]  TRUE FALSE FALSE FALSE  TRUE  TRUE  TRUE FALSE

For mtcars[foo] > 150, I get this:

                       hp
Mazda RX4           FALSE
Mazda RX4 Wag       FALSE
Datsun 710          FALSE
Hornet 4 Drive      FALSE
Hornet Sportabout    TRUE
...

Are these two of the same "type"? Is there any reason why R displays the first one without rownames and the second one with rownames?

Perhaps I've naively thought that $ and [] were more or less equivalent. I can get the same final result, but I am curious and worried if my assumptions had been wrong. "Fortunately", I ignored this difference and carried on and got the same final result.

Thank you!

Ray
  • 880
  • 1
  • 10
  • 18
  • In addition to the link in your post, check [here](https://stackoverflow.com/questions/1169456/the-difference-between-and-notations-for-accessing-the-elements-of-a-lis) – Henrik Aug 28 '17 at 12:23
  • 2
    If you google "r subsetting", you can find a lot of useful resources. There are three operators: `$`, `[`, `[[`, and you should learn when to use one or the other. Here, for instance, you likely don't want `[`, but rather `[[` (and the correct line should be `mtcars[mtcars[[foo]] >150,]`). – nicola Aug 28 '17 at 12:24
  • @nicola Thank you! I think I'll need to digest what others have said below to figure out why I'd want `[[` and not `[`. But thank you for the suggestion! I'm not entirely sure why my line still works even if the inside returns a vector or a data frame...because R is flexible enough to allow it? – Ray Aug 29 '17 at 03:17

2 Answers2

9

Below we will use the one-row data frame in order to provide briefer output:

mtcars1 <- mtcars[1, ]

Note the differences among these. We can use class as in class(mtcars["hp"]) to investigate the class of the return value.

The first two correspond to the code in the question and return a data frame and plain vector respectively. The key differences between [ and $ are that [ (1) can specify multiple columns, (2) allows passing of a variable as the index and (3) returns a data frame (although see examples later on) whereas $ (1) can only specify a single column, (2) the index must be hard coded and (3) it returns a vector.

mtcars1["hp"]  # returns data frame
##            hp
## Mazda RX4 110

mtcars1$hp # returns plain vector
## [1] 110

Other examples where index is a single element. Note that the first and second examples below are actually the same as drop = TRUE is the default.

mtcars1[, "hp"] # returns plain vector
## [1] 110  

mtcars1[, "hp", drop = TRUE] # returns plain vector
## [1] 110

mtcars1[, "hp", drop = FALSE] # returns data frame
##            hp
## Mazda RX4 110

Also there is the [[ operator which is like the $ operator except it can accept a variable as the index whereas $ requires the index to be hard coded:

mtcars1[["hp"]] # returns plain vector
## [1] 110

Others where index specifies multiple elements. $ and [[ cannot be used with multiple elements so these examples only use [:

mtcars1[c("mpg", "hp")] # returns data frame
##           mpg  hp
## Mazda RX4  21 110

mtcars1[, c("mpg", "hp")] # returns data frame
##           mpg  hp
## Mazda RX4  21 110

mtcars1[, c("mpg", "hp"), drop = FALSE] # returns data frame
##           mpg  hp
## Mazda RX4  21 110

mtcars1[, c("mpg", "hp"), drop = TRUE] # returns list
## $mpg
## [1] 21
## 
## $hp
## [1] 110

[

mtcars[foo] can return more than one column if foo is a vector with more than one element, e.g. mtcars[c("hp", "mpg")], and in all cases the return value is a data.frame even if foo has only one element (as it does in the question).

There is also mtcars[, foo, drop = FALSE] which returns the same value as mtcars[foo] so it always returns a data frame. With drop = TRUE it will return a list rather than a data.frame in the case that foo specifies multiple columns and returns the column itself if it specifies a single column.

[[

On the other hand mtcars[[foo]] only works if foo has one element and it returns that column, not a data frame.

$

mtcars$hp also only works for a single column, like [[, and returns the column, not a data frame containing that column.

mtcars$hp is like mtcars[["hp"]]; however, there is no possibility to pass a variable index with $. One can only hard-code the index with $.

subset

Note that this works:

subset(mtcars, hp > 150)

returning a data frame containing those rows where the hp column exceeds 150:

                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8

other objects

The above pertain to data frames but other objects that can use $, [ and [[ will have their own rules. In particular if m is a matrix, e.g. m <- as.matrix(BOD), then m[, 1] is a vector, not a one column matrix, but m[, 1, drop = FALSE] is a one column matrix. m[[1]] and m[1] are both the first element of m, not the first column. m$a does not work at all.

help

See ?Extract for more information. Also ?"$", ?"[" and ?"[[" all get to the same page, as well.

G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • Thanks for the detailed reply, especially the `?Extract` part at the end. I didn't know what to search for within R...now I know. But looking at it now, I think the help is a bit too dense -- thanks for simplifying it for me and others reading this! – Ray Aug 29 '17 at 03:20
  • @Ray You can do `help("$")`. – Roland Aug 29 '17 at 06:16
  • @Roland I wasn't aware of that...thanks! I've always used the `?` form and `?$` obviously didn't get me anywhere... – Ray Aug 29 '17 at 06:48
  • Also, `mtcars$foo` or `mtcars[["foo"]]` returns NULL and will not fail. `mtcars[, "foo"]` will fail and return an error! – Karl Mar 16 '23 at 13:35
1

The main difference lies on the returned object :

  • Using the single bracket [] will return a dataframe.
  • When using $, you will have the vector of the elements of the dataframe.

You can apply the class(x) function to see it. Basically, in the previous example, mtcars['foo'] is a dataframe, but mtcars[['foo']] is a vector of float

Rhesous
  • 984
  • 6
  • 12
  • I don't get what you mean with "you have an access to the data itself" and "you subset the dataframe". – nicola Aug 28 '17 at 12:27
  • Instead of having a dataframe, you will have the vector of the elements of the dataframe. You can apply the `class(x)` function to see it. Basically, in the previous example, mtcars['foo'] is a dataframe, but mtcars[['foo']] is a vector of float. – Rhesous Aug 28 '17 at 12:29
  • 3
    In this comment you explained better than in the answer. Please edit, since the "data itself" has very little meaning. The difference, as you stated, lies in the returned object. You should make this point clearer. And, btw, `$` and `[[` are *not* equivalent. – nicola Aug 28 '17 at 12:32
  • @Arault small nitpick, but not about your answer (it is quite helpful!)... When I use `class (x)` on `[`, it says "matrix". When I apply it on `$`, it says "logical". I guess this alone shows that they are different type. But AFAIK, a matrix isn't a data frame and it's unfortunate it calls it "logical" instead of "vector". Isn't it a "vector of Booleans"? – Ray Aug 29 '17 at 03:27
  • @Ray : I think you used class on the mask ? I mean you wrote `class(mtcars$hp>150)` ? If so you'll have a vector of booleans (the class is indeed logical). To have a data.frame and a numeric vector, I used `class(mtcars["hp"])` and `class(mtcars$hp)` – Rhesous Aug 29 '17 at 11:35
  • @Arault Oh...I see. I thought `class(mtcars$hp)` would be a numeric vector and a `class(mtcars $hp > 150)` would be reported as a Boolean (or logical) vector. I mean both are vectors...I guess R doesn't see it that way. Thank you! – Ray Aug 30 '17 at 08:13