I'm having a problem with geom_violin. To make my problem reproducible, I created a toy dataset.
Let say this is my original data:
require(jsonlite)
data <- fromJSON("[{\"Season\":\"Spring\",\"Maximum.Profit\":2520,\"Hidden\":\"No\"},{\"Season\":\"Spring\",\"Maximum.Profit\":1710,\"Hidden\":\"No\"},{\"Season\":\"Spring\",\"Maximum.Profit\":2500,\"Hidden\":\"No\"},{\"Season\":\"Spring\",\"Maximum.Profit\":2850,\"Hidden\":\"Yes\"},{\"Season\":\"Spring\",\"Maximum.Profit\":3500,\"Hidden\":\"No\"},{\"Season\":\"Summer\",\"Maximum.Profit\":5740,\"Hidden\":\"No\"},{\"Season\":\"Summer\",\"Maximum.Profit\":5100,\"Hidden\":\"No\"},{\"Season\":\"Summer\",\"Maximum.Profit\":1710,\"Hidden\":\"No\"},{\"Season\":\"Summer\",\"Maximum.Profit\":3500,\"Hidden\":\"Yes\"},{\"Season\":\"Summer\",\"Maximum.Profit\":8000,\"Hidden\":\"No\"},{\"Season\":\"Autumn\",\"Maximum.Profit\":4920,\"Hidden\":\"No\"},{\"Season\":\"Autumn\",\"Maximum.Profit\":720,\"Hidden\":\"No\"},{\"Season\":\"Autumn\",\"Maximum.Profit\":13740,\"Hidden\":\"No\"},{\"Season\":\"Autumn\",\"Maximum.Profit\":2600,\"Hidden\":\"Yes\"},{\"Season\":\"Autumn\",\"Maximum.Profit\":3810,\"Hidden\":\"No\"},{\"Season\":\"Autumn\",\"Maximum.Profit\":-1260,\"Hidden\":\"No\"}]")
Here is my code for visualizing it:
require(ggplot2)
p <- ggplot(data, aes(x=Season, y=Maximum.Profit))
p <- p + geom_violin(aes(color=Hidden)) + geom_boxplot(aes(fill=Hidden))
Unlike the boxplot, geom_violin ignored the Hidden-"Yes" in all Seasons. I realized there was only a single value in each of these cases (Season_Hidden): "Autumn_Yes", "Spring_Yes", "Summer_Yes". So I added one more value for each. I tried not to create identical values, so I made them a little bit different, too. You can have a look at 3 lines at the bottom of data2
data2 <- fromJSON("[{\"Season\":\"Spring\",\"Hidden\":\"No\",\"Maximum.Profit\":2520},{\"Season\":\"Spring\",\"Hidden\":\"No\",\"Maximum.Profit\":1710},{\"Season\":\"Spring\",\"Hidden\":\"No\",\"Maximum.Profit\":2500},{\"Season\":\"Spring\",\"Hidden\":\"Yes\",\"Maximum.Profit\":2850},{\"Season\":\"Spring\",\"Hidden\":\"No\",\"Maximum.Profit\":3500},{\"Season\":\"Summer\",\"Hidden\":\"No\",\"Maximum.Profit\":5740},{\"Season\":\"Summer\",\"Hidden\":\"No\",\"Maximum.Profit\":5100},{\"Season\":\"Summer\",\"Hidden\":\"No\",\"Maximum.Profit\":1710},{\"Season\":\"Summer\",\"Hidden\":\"Yes\",\"Maximum.Profit\":3500},{\"Season\":\"Summer\",\"Hidden\":\"No\",\"Maximum.Profit\":8000},{\"Season\":\"Autumn\",\"Hidden\":\"No\",\"Maximum.Profit\":4920},{\"Season\":\"Autumn\",\"Hidden\":\"No\",\"Maximum.Profit\":720},{\"Season\":\"Autumn\",\"Hidden\":\"No\",\"Maximum.Profit\":13740},{\"Season\":\"Autumn\",\"Hidden\":\"Yes\",\"Maximum.Profit\":2600},{\"Season\":\"Autumn\",\"Hidden\":\"No\",\"Maximum.Profit\":3810},{\"Season\":\"Autumn\",\"Hidden\":\"No\",\"Maximum.Profit\":-1260},{\"Season\":\"Autumn\",\"Hidden\":\"Yes\",\"Maximum.Profit\":2607.2},{\"Season\":\"Spring\",\"Hidden\":\"Yes\",\"Maximum.Profit\":2857.2},{\"Season\":\"Summer\",\"Hidden\":\"Yes\",\"Maximum.Profit\":3507.2}]")
But this data2
created the same figure as data
. So I forced it a litter bit more:
p <- ggplot(rbind(data2, data2), aes(x=Season, y=Maximum.Profit))
p <- p + geom_violin(aes(color=Hidden), scale="width", position=position_dodge(width=1))
p <- p + geom_boxplot(aes(fill=Hidden), position=position_dodge(width=1), width=0.2)
(Additional settings for geom_boxplot
and geom_boxplot
is not important. I just put it there to make it prettier)
Now this is the picture that I want but I don't want to do it in a sneaky way, such as using rbind(data2, data2)
instead of data
in the previous example.
Does anyone know a better and more stable solution for this issue? How to make geom_violin NOT ignore low-variance values, or at least, leave one side blank so that it won't mess up when combining with other geometry (boxplot
in this case)