18

I wanted to create a barplot in which the bars were ordered by height rather than alphabetically by category. This worked fine when the only package I loaded was ggplot2. However, when I loaded a few more packages and ran the same code that created, sorted, and plotted my data frame, the bars had reverted to being sorted alphabetically again.

I checked the data frame each time using str() and it turned out that the attributes of the data frame were now different, even though I'd run the same code each time.

My code and output are listed below. Can anyone explain the differing behavior? Why does loading a few apparently unrelated packages (unrelated in the sense that none of the functions I'm using seem to be masked by the newly loaded packages) change the result of running the transform() function?

Case 1: Just ggplot2 loaded

library(ggplot2)

group = c("C","F","D","B","A","E")
num = c(12,11,7,7,2,1)
data = data.frame(group,num)
data1 = transform(data, group=reorder(group,-num))

> str(data1)
'data.frame':   6 obs. of  2 variables:
 $ group: Factor w/ 6 levels "C","F","B","D",..: 1 2 4 3 5 6
  ..- attr(*, "scores")= num [1:6(1d)] -2 -7 -12 -7 -1 -11
  .. ..- attr(*, "dimnames")=List of 1
  .. .. ..$ : chr  "A" "B" "C" "D" ...
 $ num  : num  12 11 7 7 2 1

Case 2: Load several more packages, then run the same code again

library(plyr)
library(xtable)
library(Hmisc)
library(gmodels)
library(reshape2)
library(vcd)
library(lattice)

group = c("C","F","D","B","A","E")
num = c(12,11,7,7,2,1)
data = data.frame(group,num)
data1 = transform(data, group=reorder(group,-num))

> str(data1)
'data.frame':   6 obs. of  2 variables:
 $ group: Factor w/ 6 levels "A","B","C","D",..: 3 6 4 2 1 5
 $ num  : num  12 11 7 7 2 1

UPDATE: SessionInfo()

Case 1: Ran sessionInfo() after loading ggplot2

> sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
  [1] C/en_US.UTF-8/C/C/C/C

attached base packages:
  [1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
  [1] ggplot2_0.9.1

loaded via a namespace (and not attached):
  [1] MASS_7.3-18        RColorBrewer_1.0-5 colorspace_1.1-1   dichromat_1.2-4    digest_0.5.2       grid_2.15.0       
[7] labeling_0.1       memoise_0.1        munsell_0.3        plyr_1.7.1         proto_0.3-9.2      reshape2_1.2.1    
[13] scales_0.2.1       stringr_0.6        tools_2.15.0

Case 2: Ran sessionInfo() after loading the additional packages

> sessionInfo()
R version 2.15.0 (2012-03-30)
Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

locale:
  [1] C/en_US.UTF-8/C/C/C/C

attached base packages:
  [1] grid      splines   stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
  [1] lattice_0.20-6   vcd_1.2-13       colorspace_1.1-1 MASS_7.3-18      reshape2_1.2.1   gmodels_2.15.2  
[7] Hmisc_3.9-3      survival_2.36-14 xtable_1.7-0     plyr_1.7.1       ggplot2_0.9.1   

loaded via a namespace (and not attached):
  [1] RColorBrewer_1.0-5 cluster_1.14.2     dichromat_1.2-4    digest_0.5.2       gdata_2.8.2        gtools_2.6.2      
[7] labeling_0.1       memoise_0.1        munsell_0.3        proto_0.3-9.2      scales_0.2.1       stringr_0.6       
[13] tools_2.15.0
Menuka Ishan
  • 5,164
  • 3
  • 50
  • 66
eipi10
  • 91,525
  • 24
  • 209
  • 285
  • Could you provide the output of `sessionInfo()`? If anyone can help, they may have to match your R and package versions to replicate this. – joran Jun 07 '12 at 20:49
  • I can replicate this on R 2.15.0 with the up to date CRAN packages (on Ubuntu) – Justin Jun 07 '12 at 20:54
  • 1
    Very interesting. Looks like the change in the results of `transform()` only appears after loading `gmodels` (and it's not fixed by subsequently detaching `gmodels`). I'm intrigued... (FWIW, I'm on Windows XP, running R-devel, so it looks like this is not an OS or version specific problem.) – Josh O'Brien Jun 07 '12 at 21:14
  • @joran I've added output of `sessionInfo()` as an edit to my question. – eipi10 Jun 07 '12 at 21:14
  • You can get similar behavior with most `ggplot2` objects by running `str` with and without loading `library(proto)`. Using `proto` greatly expands the display of `proto` objects. – Faheem Mitha May 19 '13 at 14:45
  • I believe this is also marginally relevant to the problem: http://stackoverflow.com/a/20335767/911945 . In short, `reorder` needs the second parameter to be `as.factor(.)` to order properly. – Anton Tarasenko Dec 03 '13 at 05:08

1 Answers1

13

This happens because:

  1. gmodels imports gdata
  2. gdata creates a new method for reorder.factor

Start a clean session. Then:

methods("reorder")
[1] reorder.default*    reorder.dendrogram*

Now load gdata (or load gmodels, which has the same effect):

library(gdata)
methods("reorder")
[1] reorder.default*    reorder.dendrogram* reorder.factor 

Notice there is no masking, since reorder.factor doesn't exist in base

Recreate the problem, but this time explicitly call the different packages:

group = c("C","F","D","B","A","E")
num = c(12,11,7,7,2,1)
data = data.frame(group,num)

The base R version (using reorder.default):

str(transform(data, group=stats:::reorder.default(group,-num)))
'data.frame':   6 obs. of  2 variables:
 $ group: Factor w/ 6 levels "C","F","B","D",..: 1 2 4 3 5 6
  ..- attr(*, "scores")= num [1:6(1d)] -2 -7 -12 -7 -1 -11
  .. ..- attr(*, "dimnames")=List of 1
  .. .. ..$ : chr  "A" "B" "C" "D" ...
 $ num  : num  12 11 7 7 2 1

The gdata version (using reorder.factor):

str(transform(data, group=gdata:::reorder.factor(group,-num)))
'data.frame':   6 obs. of  2 variables:
 $ group: Factor w/ 6 levels "A","B","C","D",..: 3 6 4 2 1 5
 $ num  : num  12 11 7 7 2 1
Andrie
  • 176,377
  • 47
  • 447
  • 496
  • 4
    You can get the "expected" order using the `gdata::reorder.factor` version by adding a `FUN=identity` argument: `data1 = transform(data, group=reorder(group,-num,FUN=identity))`. – Brian Diggs Jun 07 '12 at 21:46
  • 2
    Just to make sure I understand the lesson here: When you load a package, you can get different behavior with the exact same code, even in the absence of masking, if the new package has a method specific to your object (in this case `reorder.factor`), that "overrides" the behavior of the "top"-level method (in this case, generic `reorder`) that would otherwise apply to your object. Is that correct? – eipi10 Jun 08 '12 at 12:34
  • 1
    @eipi10 Yes, your example clearly illustrates that. Pedantry about terminology: `reorder.factor` gets dispatched rather than `reorder.default` (thus in a sense overriding the previous behaviour). This is a very interesting problem. thank you. – Andrie Jun 08 '12 at 12:44
  • Andrie, thanks for the clear and detailed answer. @BrianDiggs Thanks for showing how to recover the desired behavior. – eipi10 Jun 08 '12 at 13:08
  • 1
    Many thanks everyone. This problem was driving me berserk. Is there any way to add some more metadata to this question to make it more readily discoverable? There's at least [one other poor soul](http://rstudio-pubs-static.s3.amazonaws.com/8795_79e543eb64a845aa9c9635ad63bc29da.html) out there in a state of confusion: – David Lovell Oct 01 '13 at 10:23