4

I was doing a course on R and I came across this line of code:

onlineta_city_hotels <- filter(hotel_bookings, 
                           (hotel=="City Hotel" & 
                             hotel_bookings$market_segment=="Online TA"))

This code did successfully produce the expected result. However, I tried removing the dollar sign and executed a new code:

onlineta_city_hotels <- filter(hotel_bookings, 
                           (hotel=="City Hotel" & 
                             market_segment=="Online TA"))

This code showed the same result. So I wanted to know the need for using a dollar sign in such statements.

AnilGoyal
  • 25,297
  • 4
  • 27
  • 45
SATHWIK
  • 73
  • 6
  • 4
    You should *not* use `$` indexing inside `dplyr::filter` (just like you wouldn't use `$` indexing inside base R's `subset`). The first code example is not good R code. – Maurits Evers Jul 06 '22 at 04:59

1 Answers1

4

Most of the tidyverse (and tidyverse includes dplyr) functions use Data masking. To quote from package authors-

.. provides data masking, which blurs the distinction between two types of variables:

  1. env-variables are "programming" variables and live in an environment. They are usually created with <-. Env-variables can be any type of R object.
  1. data-variables are "statistical" variables and live in a data frame. Data-variables live inside data frames, so must be vectors.

Now filter function comes from dplyr which uses data masking. So market_segment variable which live inside hotel_bookings can be called directly inside dplyr functions. This may not be the case always with base R functions.

hotel_bookings[hotel_bookings$hotel=="City Hotel" & 
                             hotel_bookings$market_segment=="Online TA", ]

The following will not work here

hotel_bookings[hotel=="City Hotel" & 
                             market_segment=="Online TA", ]

Thus inside all dplyr functions, variable names can be called by themselves (without using $).

For further reading/reference please see this page.

AnilGoyal
  • 25,297
  • 4
  • 27
  • 45
  • 1
    I don't think *"attaches the dataset"* or the term "attach" is quite the right term here. `dplyr` does *data masking*, which creates an environment from the `data.frame` and extracts variables from that env (internally using `$`). `attach` is often [mis/abused](https://stackoverflow.com/questions/10067680/why-is-it-not-advisable-to-use-attach-in-r-and-what-should-i-use-instead) and is different to what `dplyr` does. – Maurits Evers Jul 06 '22 at 05:20