7

I am completely confused. I have a function which is creating a table go quantiles. My problem, if within the data provided to the function is a column named "species" then I want to group by this column. Otherwise do the same code ungrouped. I get the warning, that this is deprecated but even though, it is strange that all my variables are changing.

I am pretty sure that this is a new behaviour and didn't happened before since I am using this function since 2 years or so without changing it.

Can somebody have a look?

library(dplyr)

set.seed(1)
df<- data.frame(Intensity=rnorm(1000, 25, 3))
class(df)
#> [1] "data.frame"
df_backup <- df
class(df_backup)
#> [1] "data.frame"
my_plotAbundanceRank <- function(data_set) {
    quantile_df <- 
        data_set %>% 
        dplyr::group_by_at(vars(matches('^species$'))) %>%
        dplyr::summarise(`5%`=stats::quantile(log10(Intensity),.05),
                         `50%`=stats::quantile(log10(Intensity),.50),
                         `95%`=stats::quantile(log10(Intensity),.95)) 
}
print(my_plotAbundanceRank(df))
#> # A tibble: 1 x 3
#>    `5%` `50%` `95%`
#>   <dbl> <dbl> <dbl>
#> 1  1.30  1.40  1.48
class(df)
#> [1] "tbl_df"     "tbl"        "data.frame"
class(df_backup)
#> [1] "tbl_df"     "tbl"        "data.frame"

After execution, the class is changing from [1] "data.frame" to [1] "tbl_df" "tbl" "data.frame" for all variables, even they are not provided to the function.

I am using dplyr_0.8.0.1 which is pretty new and might cause the problem.

Any ideas?

UPDATE

So I tested with dplyr_0.7.8 and the code is working as expected, so all variables stay data.frame.

devtools::install_version("dplyr", version = "0.7.8", repos = "http://cran.us.r-project.org")
drmariod
  • 11,106
  • 16
  • 64
  • 110
  • 1
    No problem with version `0.7.8` – Clemsang Feb 22 '19 at 07:53
  • 1
    I just tested and updated. It comes with `0.8` though. – drmariod Feb 22 '19 at 07:54
  • `dplyr::intersect(names(data_set), c("species"))` is `character(0)` in your example, replacing `species` by `Intensity` gives me `data.frame` for 0.8.0.1 – Clemsang Feb 22 '19 at 08:23
  • @Clemsang The idea is to check if a column `species` exist, if so I want to `group_by` this species column. I found this solution some time ago here on SO. I don't want to use `if else` here, since I would need to doublicate code (the function is more complex in reality, this is just reproducible example). – drmariod Feb 22 '19 at 08:31
  • 1
    Maybe using `group_by_if` ? – Clemsang Feb 22 '19 at 08:42
  • @Clemsang `dplyr::group_by_at(vars(matches('^species$')))` did the trick, I will update, but still don't understand why it changes class of all my variables? – drmariod Feb 22 '19 at 09:02
  • forgot to test under `dplyr_0.8.0` and still get the class modification :-( – drmariod Feb 22 '19 at 09:32
  • what about `dplyr::group_by("species")` ? – Clemsang Feb 25 '19 at 08:38
  • @Clemsang this fails if `species` column doesn't exist... I want to group if the column is there, if not -> no grouping! – drmariod Feb 25 '19 at 09:03
  • It seems to work with `group_by` instead of your previous `group_by_` – Clemsang Feb 25 '19 at 09:06
  • 3
    it looks like a bug but can anyone else actually reproduce this ? could you use `reprex::reprex` on this ? it's really better for this type of issues – moodymudskipper Feb 25 '19 at 13:56
  • This is weird indeed, I don't have the last `dplyr` at hand but I suggest you post it there : https://github.com/tidyverse/dplyr/issues , you can probably make it more minimal too, it seems like these `group_by` and `summarize` are not necessary (or if they are it should be mentioned as it'll be a further clue) – moodymudskipper Feb 25 '19 at 14:04
  • Did filled a bug report this morning https://github.com/tidyverse/dplyr/issues/4221 actually the `group_by` is necessary to create the problem, as far I figured out. The whole thing can be fixed by testing if the `species` column exist and using a specific command, but I didn't want to duplicate code and it was working fine under 0.7.8 – drmariod Feb 25 '19 at 14:08
  • 3
    This seems similar to the documented behaviour of `data.table` [due to pass-by-reference](https://stackoverflow.com/questions/10225098/understanding-exactly-when-a-data-table-is-a-reference-to-vs-a-copy-of-another). You can avoid this side effect using `df_backup <- data.table::copy(df)` in the same way that you would with a data.table. – dww Feb 26 '19 at 02:39
  • This appears to be resolved by `dplyr v0.8.3` – Dave Gruenewald Jan 09 '20 at 18:30

1 Answers1

0

This bug has been fixed as of dplyr v0.8.1.

Mark
  • 7,785
  • 2
  • 14
  • 34