14

I am using r 3.3.3, dplyr 0.7.4, and Hmisc 4.1-1. I noticed that the order I load packages effects whether or not a dplyr::summaries function wold work or not. I understand that loading packages in a different order would mask certain functions but I am using the package::function() syntax to avoid that issue. The exact issue revolves around labeled variables. I know that there has been issues in the past with tidyverse and variable labels but none seem to address why this particular situation is occurring.

First example that works - I load only Hmisc then dplyr and I am able to summaries the data-

#this works fine
library(Hmisc)
library(dplyr)

Hmisc::label(iris$Petal.Width) <- "Petal Width"

sumpct <- iris %>% 
  dplyr::group_by(Species) %>% 
  dplyr::summarise(med =median(Petal.Width),A40 = round(100*ecdf(Petal.Width)(.40),1),
            A50 =round(100*ecdf(Petal.Width)(.50),1),
            mns = mean(Petal.Width),
            lowermean = mean(Petal.Width)-sd(Petal.Width),
            lowermedian = median(Petal.Width) - sd(Petal.Width))

Second example below breaks. I start a new session and load tidyverse after Hmisc and still use the package::function() syntax but this throws the error :

Error in summarise_impl(.data, dots) : Evaluation error: x and labels must be same type.

Second example:

###restart session 
#this example does not work

library(Hmisc)
library(tidyverse)


Hmisc::label(iris$Petal.Width) <- "Petal Width"

sumpct <- iris %>% 
  dplyr::group_by(Species) %>% 
  dplyr::summarise(med =median(Petal.Width),A40 = round(100*ecdf(Petal.Width)(.40),1),
                   A50 =round(100*ecdf(Petal.Width)(.50),1),
                   mns = mean(Petal.Width),
                   lowermean = mean(Petal.Width)-sd(Petal.Width),
                   lowermedian = median(Petal.Width) - sd(Petal.Width))

However, the third example does work where I just restart the session and load tidyverse before Hmisc

Third example:

###switch order of loading packages and this works

library(tidyverse)
library(Hmisc)


Hmisc::label(iris$Petal.Width) <- "Petal Width"

sumpct <- iris %>% 
  dplyr::group_by(Species) %>% 
  dplyr::summarise(med =median(Petal.Width),A40 = round(100*ecdf(Petal.Width)(.40),1),
                   A50 =round(100*ecdf(Petal.Width)(.50),1),
                   mns = mean(Petal.Width),
                   lowermean = mean(Petal.Width)-sd(Petal.Width),
                   lowermedian = median(Petal.Width) - sd(Petal.Width)) 

So my question is why does the order in which I load packages matter when I am using the package::function() syntax specifically with respect to labeled variables and tidyverse?

Update: session info below for the error:

sessionInfo()

R version 3.3.3 (2017-03-06) Running under: Windows 7 x64 attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] bindrcpp_0.2 forcats_0.3.0
stringr_1.3.0 dplyr_0.7.4 [5] purrr_0.2.4 readr_1.1.1
tidyr_0.8.0 tibble_1.4.2 [9] tidyverse_1.2.1 Hmisc_4.1-1
ggplot2_2.2.1 Formula_1.2-2 [13] survival_2.41-3 lattice_0.20-35

loaded via a namespace (and not attached): [1] reshape2_1.4.3
splines_3.3.3 haven_1.1.1 [4] colorspace_1.3-2
htmltools_0.3.6 base64enc_0.1-3 [7] rlang_0.2.0
pillar_1.2.1 foreign_0.8-69 [10] glue_1.2.0
RColorBrewer_1.1-2 readxl_1.0.0 [13] modelr_0.1.1
plyr_1.8.4 bindr_0.1.1 [16] cellranger_1.1.0
munsell_0.4.3 gtable_0.2.0 [19] rvest_0.3.2
htmlwidgets_1.0 psych_1.7.8 [22] latticeExtra_0.6-28 knitr_1.20 parallel_3.3.3 [25] htmlTable_1.11.2
broom_0.4.3 Rcpp_0.12.16 [28] acepack_1.4.1
scales_0.5.0 backports_1.1.2 [31] checkmate_1.8.5
jsonlite_1.5 gridExtra_2.3 [34] mnormt_1.5-5
hms_0.4.2 digest_0.6.15 [37] stringi_1.1.7
grid_3.3.3 cli_1.0.0 [40] tools_3.3.3
magrittr_1.5 lazyeval_0.2.1 [43] cluster_2.0.6
crayon_1.3.4 pkgconfig_2.0.1 [46] Matrix_1.2-12
xml2_1.2.0 data.table_1.10.4-3 [49] lubridate_1.7.3
assertthat_0.2.0 httr_1.3.1 [52] rstudioapi_0.7
R6_2.2.2 rpart_4.1-13 [55] nnet_7.3-12
nlme_3.1-131.1

Mike
  • 3,797
  • 1
  • 11
  • 30
  • 1
    I just tried this, and both the second and third options gave me the error. So perhaps it is to do with the combination of Hmisc and tidyverse, rather than the package order? Perplexing – Calum You Mar 20 '18 at 20:03
  • Interesting I just tried the third example again and it worked, what versions are you using, and did you restart your session? Either way it is an interesting problem. – Mike Mar 20 '18 at 20:09
  • I currently have Hmisc 4.1-1, dplyr 0.74, tidyverse 1.2.1, R 3.4.3. I will try updating R and all other packages to see what happens, recommend you do the same and post your session info – Calum You Mar 20 '18 at 20:16
  • Just added session info – Mike Mar 20 '18 at 20:33
  • Seems to be caused by both `haven` and `Hmisc` using a `labelled` S3 class and defining `[.labelled`. You use `Hmsic::label` to create a `labelled` variable, but when you try to subset it with `[` in any scenario, the `haven` version is called instead if you loaded `tidyverse` after `Hmisc`. You can see this with `getAnywhere("[.labelled")`. – Mikko Marttila Mar 20 '18 at 21:25
  • 1
    Just to drill down on the issue, a minimal example would be to just do `head(iris)` after you've assigned the label to `iris$Petal.Width`. – Mikko Marttila Mar 20 '18 at 21:33
  • And the final piece is, that because `haven` is listed as an Import in `tidyverse`, its namespace gets loaded when `tidyverse` is loaded. And if that happens _after_ the `Hmisc` namespace has been loaded, the `[.labelled` S3 method from `haven` will be found before the `Hmisc` one during S3 dispatch. Doesn't seem like there is any way around this, other than changes to either `haven` or `Hmsic`. – Mikko Marttila Mar 20 '18 at 21:55
  • Great thank you for the response. If you submit it as the answer I will accept it. – Mike Mar 21 '18 at 11:55

1 Answers1

15

UPDATE: As of haven version 2.0.0 this issue has been resolved, as the haven "labelled" class was renamed to "haven_labelled" to avoid conflicts with Hmisc.


tl;dr: Order matters.

For a more detailed answer, let's first reproduce the error:

library(Hmisc)
#> Loading required package: lattice
#> Loading required package: survival
#> Loading required package: Formula
#> Loading required package: ggplot2
#> 
#> Attaching package: 'Hmisc'
#> The following objects are masked from 'package:base':
#> 
#>     format.pval, units
library(tidyverse)
#> Warning: package 'forcats' was built under R version 3.4.4

After removing elements piece by piece from the original summarise example, I managed to reduce reproducing the error to just these lines of code:

Hmisc::label(iris$Petal.Width) <- "Petal Width"
head(iris)
#> Error: `x` and `labels` must be same type

We can have a look at the traceback to see if we can locate a function that could be causing the error:

traceback()
#> 8: stop("`x` and `labels` must be same type", call. = FALSE)
#> 7: labelled(NextMethod(), attr(x, "labels"))
#> 6: `[.labelled`(xj, i)
#> 5: xj[i]
#> 4: `[.data.frame`(x, seq_len(n), , drop = FALSE)
#> 3: x[seq_len(n), , drop = FALSE]
#> 2: head.data.frame(iris)
#> 1: head(iris)

The [.labelled call looks suspicious. Why is it even called?

lapply(iris, class)
#> $Sepal.Length
#> [1] "numeric"
#> 
#> $Sepal.Width
#> [1] "numeric"
#> 
#> $Petal.Length
#> [1] "numeric"
#> 
#> $Petal.Width
#> [1] "labelled" "numeric" 
#> 
#> $Species
#> [1] "factor"

Ah, setting a label for Petal.Width with Hmisc::label also added the S3 class. We can inspect where the method is defined with getAnywhere:

getAnywhere("[.labelled")
#> 2 differing objects matching '[.labelled' were found
#> in the following places
#>   registered S3 method for [ from namespace haven
#>   namespace:Hmisc
#>   namespace:haven
#> Use [] to view one of them

Indeed, both haven and Hmisc define the method. And since haven is loaded after Hmisc, its definition is found first, and thus gets used:

getAnywhere("[.labelled")[1]
#> function (x, ...) 
#> {
#>     labelled(NextMethod(), attr(x, "labels"))
#> }
#> <environment: namespace:haven>

haven expects labelled objects to have a labels attribute, which Hmisc::label doesn't create:

attr(iris$Petal.Width, "labels")
#> NULL

And that's where the error comes from.


But wait: why is haven even loaded? It's not attached with library(tidyverse). Turns out, that haven is listed as an imported package in tidyverse, which causes it to be loaded when the package is attached (see e.g. here). And loading a package, among other things, registers its S3 methods: which is where the conflict comes from.

As it is, if you want to use both Hmisc and tidyverse, order matters. To address the issue further would likely require source level changes in the packages' use of the labelled S3 class.

Created on 2018-03-21 by the reprex package (v0.2.0).

Mikko Marttila
  • 10,972
  • 18
  • 31
  • Incredible answer and bug-digging ! Little question, wouldn't it be possible to manually override the method ? For a simple function I would write something like `\`[.labelled\` <- Hmisc::\`[.labelled\``, but it doesn't seems to apply here. – Dan Chaltiel Sep 13 '18 at 10:23
  • @DanChaltiel Good question, but I have to say I'm a bit out of my depth here. I think this goes to the details of how S3 method dispatch works (and I can't find a good reference with a quick search), but it seems that the gist of it is that methods in loaded packages are found before methods in the global environment. A hack-y workaround could be to override the method in the **haven** namespace (but beware if you actually want to use the **haven** version at some point!). This should do the trick: `assignInNamespace("[.labelled", Hmisc:::"[.labelled", asNamespace("haven"))`. – Mikko Marttila Sep 13 '18 at 12:07