0

I am getting into the programming language R, and I would like to know the difference between dplyr::group_by() or just group_by()?, what does this operator "::" do?.

Thanks!

Pat
  • 1
  • Does this answer your question? [Is it a good practice to call functions in a package via ::](https://stackoverflow.com/questions/23232791/is-it-a-good-practice-to-call-functions-in-a-package-via) – Dan Adams May 02 '22 at 00:16

2 Answers2

1

The first would be appropriate when you were uncertain that the dplyr package were loaded. The two would be equivalent if it were loaded.

IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • 2
    I also want to add that :: is useful in case you have two packages loaded that have the same function name. select() is an example here. dplyr::select is useful if you have plyr loaded because you were using ddply() or something like that, since select() is also a function in plyr – hachiko May 01 '22 at 22:25
  • Yes. However, I don’t think that is an issue with ’group_by’. … yet. – IRTFM May 01 '22 at 23:40
  • I think you've misused the terminology: having `dplyr` loaded doesn't mean its exported functions are visible, that's having it "attached". Being attached implies being loaded, but you can have a lot of packages loaded but not attached, so their functions aren't available without the `::` prefix. – user2554330 May 01 '22 at 23:56
  • @user2554330 : you might be right, although most people use ‘library (tidyverse)’ and I’m pretty sure that exposes ‘group_by’. I know this because I formerly would use ‘library(dplyr)’ when attempting answers to Hadley erase Q’s and then there was alway two or three missing functions that were being used that needed to be looked up to see which package was being assumed but not stated. – IRTFM May 02 '22 at 00:17
  • The `tidyverse` package is unusual in that it attaches a bunch of other packages when you attach it. So `library(tidyverse)` gets `dplyr` attached as well. In a new session, `library(tidyverse)` will add about 9 entries to the search list. – user2554330 May 02 '22 at 00:30
  • 1
    If you run `sessionInfo()`, the last part of the listing gives all the packages that are loaded and not attached. I see 43 of them after running `library(tidyverse)`. – user2554330 May 02 '22 at 00:32
1

In dplyr::group_by(...), the :: operator says to load the dplyr package if necessary, and to select the group_by function from that package.

Without the dplyr:: prefix, it just says to find a function named group_by in the current environment or its ancestor environments (which include all packages in the search() list).

Using :: is a bit safer (in a long script or package you might have created a local function named group_by and you didn't mean to run that one, or some other package may have a function with that name), but also a bit slower.

Once you're writing R packages (which is easier than you think), you can import group_by explicitly from dplyr, and get both advantages: safety and speed. It's kind of equivalent to doing

group_by <- dplyr::group_by
... many uses of group_by here ...

where you only pay the cost of :: once.

user2554330
  • 37,248
  • 4
  • 43
  • 90
  • Also - the `::` helps resolve namespace collisions where multiple packages are loaded that have conflicting function names. E.g. `stats::filter()` vs `dplyr::filter()`. The [{conflicted}](https://github.com/r-lib/conflicted) package is a great tool to alleviate this issue. – Dan Adams May 02 '22 at 00:14
  • That's really the same issue. If both `stats` and `dplyr` are attached, both versions of `filter` will be found in ancestors of the current environment. R will pick the first one it finds. Whether that's the one from `stats` or from `dplyr` depends on where they are in the search list. – user2554330 May 02 '22 at 00:18
  • Fair enough, but 'ancestors' is a bit of an esoteric term so it might improve the answer to explain a bit further. – Dan Adams May 02 '22 at 11:11
  • @DanAdams: I've made a small edit. – user2554330 May 02 '22 at 11:47