9

Given a fresh session, executing a small ggparcoord(.) example provided in the documentation of the function

library(GGally)

data(diamonds, package="ggplot2")
diamonds.samp <- diamonds[sample(1:dim(diamonds)[1], 100), ]
ggparcoord(data = diamonds.samp, columns = c(1, 5:10))

results into the following plot:

enter image description here

Again, starting in a fresh session and executing the same script with the loaded dplyr

library(GGally)
library(dplyr)

data(diamonds, package="ggplot2")
diamonds.samp <- diamonds[sample(1:dim(diamonds)[1], 100), ]
ggparcoord(data = diamonds.samp, columns = c(1, 5:10))

results in:

Error: (list) object cannot be coerced to type 'double'

Note that the order of the library(.) statements does not matter.

Questions

  1. Is there something wrong with the code samples?
  2. Is there a way to overcome the problem (over some namespace functions)?
  3. Or is this a bug?

I need both dplyr and ggparcoord(.) in a bigger analysis but this minimal example reflects the problem i am facing.

Versions

  • R @ 3.2.3
  • dplyr @ 0.4.3
  • GGally @ 1.0.1
  • ggplot @ 2.0.0

UPDATE

To wrap the excellent answer given by Joran up:

Answers

  1. The code samples are in fact wrong as ggparcoord(.) expects a data.frame not a tbl_df as given by the diamonds data set (if dplyr is loaded).
  2. The problem is solved by coercing the tbl_df to a data.frame.
  3. No it is not a bug.

Working code sample:

library(GGally)
library(dplyr)

data(diamonds, package="ggplot2")
diamonds.samp <- diamonds[sample(1:dim(diamonds)[1], 100), ]
ggparcoord(data = as.data.frame(diamonds.samp), columns = c(1, 5:10))
Jaap
  • 81,064
  • 34
  • 182
  • 193
Hannes
  • 93
  • 4
  • I have everything the same but GGally @ 1.0.0 and I have the same error in both code – HubertL Feb 10 '16 at 22:32
  • Did you reload the sessions between both code samples? Instead of reloading the session you can also detach the dplyr package (could be considered a workaround). – Hannes Feb 10 '16 at 22:46
  • 7
    The GGally package here is making the reasonable assumption that using `[` on a data frame should behave the way it always does and always has. However, this all being in the Hadley-verse, the diamonds data set is a `tbl_df` as well as a `data.frame`. When `dplyr` is loaded, the behavior of `[` is overridden such that `drop = FALSE` is always the default for a `tbl_df`. So there's a place in GGally where `data[,"cut"]` is expected to return a vector, but instead it returns another data frame. – joran Feb 10 '16 at 23:17
  • 3
    ...specifically, the error is thrown in your example while attempting to execute: `data[, fact.var] <- as.numeric(data[, fact.var])`. Since `data[,fact.var]` remains a data frame, and hence a list, `as.numeric` won't work. – joran Feb 10 '16 at 23:22
  • That is awesome. Nice two comments btw. Make them an answer! – Dirk Eddelbuettel Feb 10 '16 at 23:38
  • Thanks for the fast and good answer! – Hannes Feb 10 '16 at 23:59
  • 1
    @joran make a pull request to GGally. :) – Roman Luštrik Feb 11 '16 at 11:35
  • This totally is a bug! `GGally::ggparcoord() breaks whenever `dplyr` is loaded. Unless you workaround by coercing your data to `as.data.table(... )` possibly with `..., keep.rownames=TRUE` unless you want to lose all your rownames. – smci Mar 13 '17 at 12:31
  • @joran great job diagnosing. Can you please file a [GGally issue](https://github.com/ggobi/ggally/issues) and make a pull request? – smci Mar 13 '17 at 12:39
  • @smci Looks like it's already been fixed. They switched from `[` to `[[` which is probably best anyway. – joran Mar 14 '17 at 21:20

2 Answers2

16

Converting my comments to an answer...

The GGally package here is making the reasonable assumption that using [ on a data frame should behave the way it always does and always has. However, this all being in the Hadley-verse, the diamonds data set is a tbl_df as well as a data.frame.

When dplyr is loaded, the behavior of [ is overridden such that drop = FALSE is always the default for a tbl_df. So there's a place in GGally where data[,"cut"] is expected to return a vector, but instead it returns another data frame.

...specifically, the error is thrown in your example while attempting to execute:

data[, fact.var] <- as.numeric(data[, fact.var]). 

Since data[,fact.var] remains a data frame, and hence a list, as.numeric won't work.

As for your conclusion that this isn't a bug, I'd say....maybe. Probably. At least there probably isn't anything the GGally package author ought to do to address it. You just have to be aware that using tbl_df's with non-Hadley written packages may break things.

As you noted, removing the extra class attributes fixes the problem, as it returns R to using the normal [ method.

joran
  • 169,992
  • 32
  • 429
  • 468
  • 7
    For those wondering, this doesn't happen for `data.table`. Although `data.table` also overrides `[` it has a mechanism for automatic compatibility with packages expecting a `data.frame`, described [here](http://stackoverflow.com/a/10529888/403310). – Matt Dowle Feb 22 '16 at 06:50
1

Workaround: coerce your data for ggparcoord to as.data.table(...) or as.data.table(... , keep.rownames=TRUE) unless you want to lose all your rownames.

Cause: as per @joran's investigating, when dplyr is loaded, tbl_df overrides [ so that drop = FALSE.

Solution: file a pull-request on GGally. edit: fixed in v1.3.0 (https://github.com/ggobi/ggally/commit/bfa930d102289d723de2ce9ec528baf42b3b7b40)

Gerhard Burger
  • 1,379
  • 1
  • 16
  • 25
smci
  • 32,567
  • 20
  • 113
  • 146