0

I would like to plot several boxplots in one chart and know there are already similar threads out there, but none seem to be applicable to my case.

Description of the data I need to visualize: I have voter survey data on which policy areas are perceived as important (x) and I want to contrast that by data on how likely these voters would ever vote for a given party (y). This is straightforward if I want to do it only for one party, but the idea is to plot it for three parties all in one figure (otherwise we would have too many figures and it'd be hard to compare them to one another).

So let's take these hypothetical data (MyData):

Party_A     Party_B     Party_C     Salience
8           2           5           "Environmental policy"
7           0           4           "Environmental policy"
9           3           6           "Environmental policy"
0           9           4           "Tax policy"
1           8           3           "Tax policy"
2           6           3           "Tax policy"
2           3           9           "Immigration policy"
3           5           9           "Immigration policy"
1           6           0           "Immigration policy"

Where "Party_A:C" represents "Would you ever consider voting for...(0-10 scale)" and "Salience" simply indicates which policy areas they mentioned as being important. (I also have another set of binary variables that go 1 if a policy area is mentioned and 0 if not and the names of these variables are simply the given policy area -- just in case this is needed).

Now this is what I tried:

library(ggplot2)
ggplot(MyData, aes(Salience,Party_A)) + geom_boxplot(fill="black", alpha=.5) +
geom_boxplot(aes(Salience,Party_B), fill="blue", alpha=.5) +
geom_boxplot(aes(Salience,Party_C), alpha=.5) +
geom_hline(yintercept=5, color="darkred", linetype="dotted") + 
theme(text=element_text(family="serif"), panel.background=element_blank(),
    axis.text.x=element_text(angle=90,hjust=1,vjust=.3))

What this gives me is this: enter image description here

There are two issues with this that I cannot get solved:

  1. The boxes are obviously on top of each other and even with alpha=.5 it still looks messy and cannot compare anything. Thus, is there a way to have them grouped kind of like a cluster of three boxes for each policy area? It would obviously be nice to do it like here, but my data structure clearly doesn't allow for an implementation of this simply by including fill=labels as a group indicator.
  2. Another issue is the NA-bar that I cannot get rid of: I tried both to include na.omit() in the ggplot-code and subsetting it beforehand by doing this: MyData[!is.na(MyData)]. In both cases the chart disappears.

Is there any solution to this? Grateful for any advice!

pogibas
  • 27,303
  • 19
  • 84
  • 117
Dr. Fabian Habersack
  • 1,111
  • 12
  • 30
  • please add your original data using `dput`, there's no `NA` category now. – pogibas Apr 13 '18 at 11:37
  • This is going to sound weird, but Boxplots do not have x variables (i.e. the party) so you need to use group. – Elin Apr 13 '18 at 11:41
  • You need to reshape your data, and only use boxplots once in your code, e.g.: `MyData2 <- tidyr::gather(MyData, Party, value, -Salience); ggplot(MyData2, aes(Salience, value, fill = Party) + geom_boxplot()` – Axeman Apr 13 '18 at 11:41
  • `MyData[!is.na(MyData)]` What are you thinking that does? You need to refer to a specific column or columns there. – Elin Apr 13 '18 at 11:43
  • The link you give actually shows how to reshape your data too. – Axeman Apr 13 '18 at 11:48

1 Answers1

1

One way of doing it would be to work with your data in long format. Plus it will shorten and clarify your command.

You could use the function melt in package reshape2.

library(ggplot2)
library(reshape2)

This is your data example.

dat <- read.table(text='Party_A     Party_B     Party_C     Salience
8           2           5           "Environmental policy"
7           0           4           "Environmental policy"
9           3           6           "Environmental policy"
0           9           4           "Tax policy"
1           8           3           "Tax policy"
2           6           3           "Tax policy"
2           3           9           "Immigration policy"
3           5           9           "Immigration policy"
1           6           0           "Immigration policy"', 
                  header=TRUE)

The command for melting your data.

dat.m <- melt(dat, variable.name = "Party", value.name="Vote")

And the command to plot your data: ggplot will automatically place the boxplots where they need to be.

ggplot(data=dat.m, aes(x=Salience, y=Vote, fill=Party)) +
  geom_boxplot(alpha=0.5) + 
  scale_fill_manual(values=c("black", "blue", "white")) +
  geom_hline(yintercept=5, color="darkred", linetype="dotted") + 
  theme(text=element_text(family="serif"), panel.background=element_blank(),
        axis.text.x=element_text(angle=90,hjust=1,vjust=.3))

With melted data

Vincent Guillemot
  • 3,394
  • 14
  • 21