2

I'm having a problem with geom_violin. To make my problem reproducible, I created a toy dataset.

Let say this is my original data:

require(jsonlite)
data <- fromJSON("[{\"Season\":\"Spring\",\"Maximum.Profit\":2520,\"Hidden\":\"No\"},{\"Season\":\"Spring\",\"Maximum.Profit\":1710,\"Hidden\":\"No\"},{\"Season\":\"Spring\",\"Maximum.Profit\":2500,\"Hidden\":\"No\"},{\"Season\":\"Spring\",\"Maximum.Profit\":2850,\"Hidden\":\"Yes\"},{\"Season\":\"Spring\",\"Maximum.Profit\":3500,\"Hidden\":\"No\"},{\"Season\":\"Summer\",\"Maximum.Profit\":5740,\"Hidden\":\"No\"},{\"Season\":\"Summer\",\"Maximum.Profit\":5100,\"Hidden\":\"No\"},{\"Season\":\"Summer\",\"Maximum.Profit\":1710,\"Hidden\":\"No\"},{\"Season\":\"Summer\",\"Maximum.Profit\":3500,\"Hidden\":\"Yes\"},{\"Season\":\"Summer\",\"Maximum.Profit\":8000,\"Hidden\":\"No\"},{\"Season\":\"Autumn\",\"Maximum.Profit\":4920,\"Hidden\":\"No\"},{\"Season\":\"Autumn\",\"Maximum.Profit\":720,\"Hidden\":\"No\"},{\"Season\":\"Autumn\",\"Maximum.Profit\":13740,\"Hidden\":\"No\"},{\"Season\":\"Autumn\",\"Maximum.Profit\":2600,\"Hidden\":\"Yes\"},{\"Season\":\"Autumn\",\"Maximum.Profit\":3810,\"Hidden\":\"No\"},{\"Season\":\"Autumn\",\"Maximum.Profit\":-1260,\"Hidden\":\"No\"}]")

Here is my code for visualizing it:

require(ggplot2)
p <- ggplot(data, aes(x=Season, y=Maximum.Profit))
p <- p + geom_violin(aes(color=Hidden)) + geom_boxplot(aes(fill=Hidden))

enter image description here

Unlike the boxplot, geom_violin ignored the Hidden-"Yes" in all Seasons. I realized there was only a single value in each of these cases (Season_Hidden): "Autumn_Yes", "Spring_Yes", "Summer_Yes". So I added one more value for each. I tried not to create identical values, so I made them a little bit different, too. You can have a look at 3 lines at the bottom of data2

data2 <- fromJSON("[{\"Season\":\"Spring\",\"Hidden\":\"No\",\"Maximum.Profit\":2520},{\"Season\":\"Spring\",\"Hidden\":\"No\",\"Maximum.Profit\":1710},{\"Season\":\"Spring\",\"Hidden\":\"No\",\"Maximum.Profit\":2500},{\"Season\":\"Spring\",\"Hidden\":\"Yes\",\"Maximum.Profit\":2850},{\"Season\":\"Spring\",\"Hidden\":\"No\",\"Maximum.Profit\":3500},{\"Season\":\"Summer\",\"Hidden\":\"No\",\"Maximum.Profit\":5740},{\"Season\":\"Summer\",\"Hidden\":\"No\",\"Maximum.Profit\":5100},{\"Season\":\"Summer\",\"Hidden\":\"No\",\"Maximum.Profit\":1710},{\"Season\":\"Summer\",\"Hidden\":\"Yes\",\"Maximum.Profit\":3500},{\"Season\":\"Summer\",\"Hidden\":\"No\",\"Maximum.Profit\":8000},{\"Season\":\"Autumn\",\"Hidden\":\"No\",\"Maximum.Profit\":4920},{\"Season\":\"Autumn\",\"Hidden\":\"No\",\"Maximum.Profit\":720},{\"Season\":\"Autumn\",\"Hidden\":\"No\",\"Maximum.Profit\":13740},{\"Season\":\"Autumn\",\"Hidden\":\"Yes\",\"Maximum.Profit\":2600},{\"Season\":\"Autumn\",\"Hidden\":\"No\",\"Maximum.Profit\":3810},{\"Season\":\"Autumn\",\"Hidden\":\"No\",\"Maximum.Profit\":-1260},{\"Season\":\"Autumn\",\"Hidden\":\"Yes\",\"Maximum.Profit\":2607.2},{\"Season\":\"Spring\",\"Hidden\":\"Yes\",\"Maximum.Profit\":2857.2},{\"Season\":\"Summer\",\"Hidden\":\"Yes\",\"Maximum.Profit\":3507.2}]")

But this data2 created the same figure as data. So I forced it a litter bit more:

p <- ggplot(rbind(data2, data2), aes(x=Season, y=Maximum.Profit))
p <- p + geom_violin(aes(color=Hidden), scale="width", position=position_dodge(width=1))
p <- p + geom_boxplot(aes(fill=Hidden), position=position_dodge(width=1), width=0.2)

(Additional settings for geom_boxplot and geom_boxplot is not important. I just put it there to make it prettier)

enter image description here

Now this is the picture that I want but I don't want to do it in a sneaky way, such as using rbind(data2, data2) instead of data in the previous example.

Does anyone know a better and more stable solution for this issue? How to make geom_violin NOT ignore low-variance values, or at least, leave one side blank so that it won't mess up when combining with other geometry (boxplot in this case)

drmariod
  • 11,106
  • 16
  • 64
  • 110
Tri M. Le
  • 53
  • 4
  • 1
    You can take reference from the answer to [this post](https://stackoverflow.com/questions/15367762/include-space-for-missing-factor-level-used-in-fill-aesthetics-in-geom-boxplot), which modifies the plot object's positions. (This assumes that you are aware of the issue... I don't know of any solution that automatically dodges empty / insufficiently small groups for `geom_violin`.) – Z.Lin Aug 30 '17 at 07:00
  • Thanks. I think that way might work. Let me try making a function to adjust the `x` in from `ggplot_build` and see what it looks like. – Tri M. Le Aug 30 '17 at 07:49

0 Answers0