3

First of all, I'm still a beginner. I'm trying to interpret and draw a stack bar plot with R. I already took a look at a number of answers but some were not specific to my case and others I simply didn't understand:

I've got a dataset dvl that has five columns, Variant, Region, Time, Person and PrecededByPrep. I'd like to make a multivariate comparison of Variant to the other four predictors. Every column can have one of two possible values:

  • Variant: elk or ieder.
  • Region = VL or NL.
  • Time: time or no time
  • Person: person or no person
  • PrecededByPrep: 1 or 0

Here's the logistic regression

From the answers I gathered that the library ggplot2 might be the best drawing library to go with. I've read its documentation but for the life of me I can't figure out how to plot this: how can I get a comparison of Variant with the other three factors?

It took me a while, but I made something similar in Photoshop to what I'd like (fictional values!).

graph

Dark gray/light gray: possible values of Variant y-axis: frequency x-axis: every column, subdivided into its possible values

I know to make individual bar plots, both stacked and grouped, but basically I do not know how to have stacked, grouped bar plots. ggplot2 can be used, but if it can be done without I'd prefer that.

I think this can be seen as a sample dataset, though I'm not entirely sure. I am a beginner with R and I read about creating a sample set.

t <- data.frame(Variant = sample(c("iedere","elke"),size = 50, replace = TRUE),
            Region = sample(c("VL","NL"),size = 50, replace = TRUE),
            PrecededByPrep = sample(c("1","0"),size = 50, replace = TRUE),
            Person = sample(c("person","no person"),size = 50, replace = TRUE),
            Time = sample(c("time","no time"),size = 50, replace = TRUE))

I'd like to have the plot to be aesthetically pleasing as well. What I had in mind:

  • Plot colours (i.e. for the bars): col=c("paleturquoise3", "palegreen3")
  • A bold font for the axis labels font.lab=2 but not for the value labels (e.g. ´regionin bold, butVLandNL` not in bold)
  • #404040 as a colour for the font, axis and lines
  • Labels for the axes: x: factors, y: frequency
Community
  • 1
  • 1
Bram Vanroy
  • 27,032
  • 24
  • 137
  • 239
  • 2
    Small points: Having several predictors doesn't make your analysis "multivariate"; that was common usage into the 1970s, but no longer. I've edited "bivalent" to "binary". – Nick Cox Jan 06 '15 at 10:09
  • 1
    Large point: It seems that you are in essence asking for R code. That would make this off-topic: see the Help Center for advice on software-related questions. There is scope for making this more statistical, but you would need to expand on which kinds of plots you imagine; it's entirely open-ended at present, so arguably too broad. – Nick Cox Jan 06 '15 at 10:12
  • @NickCox Please see my edit, I put a lot of in effort in it so I hope it's sufficient to make a more workable question. – Bram Vanroy Jan 06 '15 at 16:20
  • Does [this](http://www.cookbook-r.com/Graphs/Plotting_means_and_error_bars_%28ggplot2%29/#bar-graphs) or [this](http://www.statmethods.net/graphs/bar.html) help you? – Tim Jan 06 '15 at 16:28
  • @Tim Not really, as those links do not provide help to have stacked and grouped graphs in one. Only the seperate possibilities. – Bram Vanroy Jan 06 '15 at 16:51
  • In your plot you have Region = VL or NL. In your table, Region is 'elke' or 'ieder'. Can you please update your toy data accordingly. – Henrik Jan 08 '15 at 10:56
  • @Henrik My bad, I forgot to add a factor. I updated the data. – Bram Vanroy Jan 08 '15 at 11:06

3 Answers3

6

Here is one possibility which starts with the 'un-tabulated' data frame, melt it, plot it with geom_bar in ggplot2 (which does the counting per group), separate the plot by variable by using facet_wrap.

Create toy data:

set.seed(123)
df <- data.frame(Variant = sample(c("iedere", "elke"), size = 50, replace = TRUE),
           Region = sample(c("VL", "NL"), size = 50, replace = TRUE),
           PrecededByPrep = sample(c("1", "0"), size = 50, replace = TRUE),
           Person = sample(c("person", "no person"), size = 50, replace = TRUE),
           Time = sample(c("time", "no time"), size = 50, replace = TRUE))

Reshape data:

library(reshape2)
df2 <- melt(df, id.vars = "Variant")

Plot:

library(ggplot2)
ggplot(data = df2, aes(factor(value), fill = Variant)) +
  geom_bar() +
  facet_wrap(~variable, nrow = 1, scales = "free_x") +
  scale_fill_grey(start = 0.5) +
  theme_bw()

enter image description here

There are lots of opportunities to customize the plot, such as setting order of factor levels, rotating axis labels, wrapping facet labels on two lines (e.g. for the longer variable name "PrecededByPrep"), or changing spacing between facets.

Customization (following updates in question and comments by OP)

# labeller function used in facet_grid to wrap "PrecededByPrep" on two lines
# see http://www.cookbook-r.com/Graphs/Facets_%28ggplot2%29/#modifying-facet-label-text
my_lab <- function(var, value){
  value <- as.character(value)
    if (var == "variable") { 
      ifelse(value == "PrecededByPrep", "Preceded\nByPrep", value)
    }
}

ggplot(data = df2, aes(factor(value), fill = Variant)) +
  geom_bar() +
  facet_grid(~variable, scales = "free_x", labeller = my_lab) + 
  scale_fill_manual(values = c("paleturquoise3", "palegreen3")) + # manual fill colors
  theme_bw() +
  theme(axis.text = element_text(face = "bold"), # axis tick labels bold 
        axis.text.x = element_text(angle = 45, hjust = 1), # rotate x axis labels
        line = element_line(colour = "gray25"), # line colour gray25 = #404040
        strip.text = element_text(face = "bold")) + # facet labels bold  
  xlab("factors") + # set axis labels
  ylab("frequency")

enter image description here

Add counts to each bar (edit following comments from OP).

The basic principles to calculate the y coordinates can be found in this Q&A. Here I use dplyr to calculate counts per bar (i.e. label in geom_text) and their y coordinates, but this could of course be done in base R, plyr or data.table.

# calculate counts (i.e. labels for geom_text) and their y positions.
library(dplyr)
df3 <- df2 %>%
  group_by(variable, value, Variant) %>%
  summarise(n = n()) %>%
  mutate(y = cumsum(n) - (0.5 * n))

# plot
ggplot(data = df2, aes(x = factor(value), fill = Variant)) +
  geom_bar() +
  geom_text(data = df3, aes(y = y, label = n)) +
  facet_grid(~variable, scales = "free_x", labeller = my_lab) + 
  scale_fill_manual(values = c("paleturquoise3", "palegreen3")) + # manual fill colors
  theme_bw() +
  theme(axis.text = element_text(face = "bold"), # axis tick labels bold 
        axis.text.x = element_text(angle = 45, hjust = 1), # rotate x axis labels
        line = element_line(colour = "gray25"), # line colour gray25 = #404040
        strip.text = element_text(face = "bold")) + # facet labels bold  
  xlab("factors") + # set axis labels
  ylab("frequency")

enter image description here

Community
  • 1
  • 1
Henrik
  • 65,555
  • 14
  • 143
  • 159
  • This is getting very close to what I want. I edited my OP with some extra information concerning the aesthetics of the plot. Could you consider these as well? And is it possible to have no overlap with the values? (e.g. The column `PrecededByPrep` is so wide that it can hold the label `PrecededByPrep` (no overflow), `Person` is so wide that it can hold the text for the values `no person` and `person`). I should have mentioned this earlier, but wasn't thinking about the appearance of the plot then. I'm sorry. – Bram Vanroy Jan 08 '15 at 12:16
  • This looks great! I'm trying to edit it a little bit, but I don't know how to target the specific labels. How can I for instance give `Region` another colour and another background colour? – Bram Vanroy Jan 08 '15 at 13:37
  • 1
    There's quite a few posts on SO on conditional formatting of facets and their strips. See e.g. [**here**](http://stackoverflow.com/questions/9847559/conditionally-change-panel-background-with-facet-grid) and [**here**](http://stackoverflow.com/questions/6750664/how-to-change-the-format-of-an-individual-ggplot2-facet-plot). – Henrik Jan 08 '15 at 13:51
  • Though it doesn't belong to my original answer, I was wondering (as you might have noticed from my other answers) if there is a way to add values to the different bars, for green as well as blue. – Bram Vanroy Jan 10 '15 at 13:01
  • Oh, this is great! With which handle can I control the size of the text? I'd like it a bit smaller, but setting `text = element_text(size=12)` doesn't seem to affect the values. Does it have another handle? – Bram Vanroy Jan 10 '15 at 14:07
  • The size of `geom_text` is _set_ in `geom_text` (please see second example of [`?geom_text`](http://docs.ggplot2.org/current/geom_text.html). In the [**vignette for `theme`**](http://docs.ggplot2.org/dev/vignettes/themes.html) (also posted in my answer to one of your previous questions) the first line reads: "The theming system in ggplot2 enables a user to control _non-data elements_ of a ggplot object". – Henrik Jan 10 '15 at 14:22
  • I guess I still have to grasp the very structure and build of R as a language. It seems so ... strange to me, coming from a webcoding background. Anyway, thanks. This works perfectly and I am assigning the bounty to you! – Bram Vanroy Jan 10 '15 at 14:42
  • 1
    Glad to hear that it worked the way you wanted! Regarding learning R, I suppose you know that there are many free documents [**here**](http://stackoverflow.com/tags/r/info). In addition, when it comes to 'the very structure', you may have a look at the 'Object' and 'Actions' sections [**here**](http://www.burns-stat.com/documents/tutorials/impatient-r/) and first few 'chapters' in the 'Foundations' section [**here**](http://adv-r.had.co.nz/). – Henrik Jan 10 '15 at 15:03
  • In addition to the official `ggplot` docs, I believe [**this site**](http://sape.inf.usi.ch/quick-reference/ggplot2) gives a good feeling for the 'ggplot way' of building a plot. [**This**](http://www.cookbook-r.com/Graphs/) is a nice tutorial. – Henrik Jan 10 '15 at 15:07
  • Thanks for that information! Sometimes I hate asking a question because I know that I am not familiar with the subject, but I'm always glad when some one does an effort to teach me something, rather than chew the whole for me and spit out the solution. +1 – Bram Vanroy Jan 10 '15 at 16:55
6

Here is my proposition for a solution with function barplot of base R :

1. calculate the counts

l_count_df<-lapply(colnames(t)[-1],function(nomcol){table(t$Variant,t[,nomcol])})
count_df<-l_count_df[[1]]
for (i in 2:length(l_count_df)){
    count_df<-cbind(count_df,l_count_df[[i]])
}

2. draw the barplot without axis names, saving the bar coordinates

par(las=1,col.axis="#404040",mar=c(5,4.5,4,2),mgp=c(3.5,1,0))
bp<-barplot(count_df,width=1.2,space=rep(c(1,0.3),4),col=c("paleturquoise3", "palegreen3"),border="#404040", axisname=F, ylab="Frequency",
            legend=row.names(count_df),ylim=c(0,max(colSums(count_df))*1.2))

3. label the bars

mtext(side=1,line=0.8,at=bp,text=colnames(count_df))
mtext(side=1,line=2,at=(bp[seq(1,8,by=2)]+bp[seq(2,8,by=2)])/2,text=colnames(t)[-1],font=2)

4. add values inside the bars

for(i in 1:ncol(count_df)){
    val_elke<-count_df[1,i]
    val_iedere<-count_df[2,i]
    text(bp[i],val_elke/2,val_elke)
    text(bp[i],val_elke+val_iedere/2,val_iedere)
}

Here is what I get (with my random data) :

enter image description here

Cath
  • 23,906
  • 5
  • 52
  • 86
  • Is it possible that your last command is incomplete? R doesn't seem to want to run it. EDIT: you missed a parenthesis at the end! – Bram Vanroy Jan 08 '15 at 15:42
  • @BramVanroy ok, I had a bracket running at the end of my answer, I was wondering what it was doing there (so I deleted it...) but I just included the picture before the closing bracket of the last instruction... Really sorry for that !! (it is corrected) – Cath Jan 08 '15 at 15:44
  • I am considering this for accepting because it doesn't need any libraries. (+1!) It's great! Is it possible to 1. label the y-axis with "Frequency" and have a little more space between the labels and the values (e.g. between Region and NL/VL). – Bram Vanroy Jan 08 '15 at 15:46
  • Thanks !! You can label the yaxis with `ylab` and control the spaces between labels with line. I'll change the parameters. Tell me if it's ok – Cath Jan 08 '15 at 15:47
  • It's fine in the middle and rotated, though the distance to the axis could be a bit more. I.e. "frequency" a bit further away from the y-axis. Is that possible? – Bram Vanroy Jan 08 '15 at 15:55
  • Yes, that would be fine. (On my screen it is but a few pixels from the axis unfortunately.) – Bram Vanroy Jan 08 '15 at 16:01
  • @BramVanroy, I changed the left margin (with `mar`) and the position of axis' title (with `mgp`) to translate them 0.5 line further to the left. Let me know if it is ok for you now. (I didn't update the graphic) – Cath Jan 08 '15 at 20:19
  • Ah yes, this is great! I'll just wait a couple of days before assigning the bounty though. Maybe others will find their way here and give an even better solution (though I doubt it). – Bram Vanroy Jan 09 '15 at 09:12
  • Great, I'm glad my solution does now exactly what you want :-). It seems fair to wait. (Nothing to do with barplots but I saw you did a C2C cover : do you know they're from "my" town ?! :-) ) – Cath Jan 09 '15 at 09:19
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/68469/discussion-between-bram-vanroy-and-cathg). – Bram Vanroy Jan 09 '15 at 09:36
  • Though it doesn't belong to my original answer, I was wondering (as you might have noticed from my other answers) if there is a way to add values to the different bars, for green as well as blue. – Bram Vanroy Jan 10 '15 at 13:02
  • No worries Cath! Though I already accepted Henrik's answer as I needed it yesterday. But thank you for the effort nonetheless! – Bram Vanroy Jan 11 '15 at 11:45
2

I'm basically answering a different question. I suppose this can be seen as perversity on my part, but I really dislike barplots of pretty much any sort. They have always seemed to create wasted space because the present informationed numerical values are less useful that an appropriately constructed table. The vcd package offers an extended mosaicplot function that seems to me to be more accurately called a "multivariate barplot that any of the ones I have seen so far. It does require that you first construct a contingency table for which the xtabs function seems a perfect fit.

install.packages)"vcd")
library(vcd)
help(package=vcd,mosaic)
col=c("paleturquoise3", "palegreen3")
vcd::mosaic(xtabs(~Variant+Region + PrecededByPrep   +  Time, data=ttt) 
           ,highlighting="Variant", highlighting_fill=col)

enter image description here

That was the 5 way plot and this is the 5-way plot:

png(); vcd::mosaic( xtabs(
                  ~Variant+Region + PrecededByPrep +   Person  +  Time, 
                   data=ttt) 
                ,highlighting="Variant", highlighting_fill=col); dev.off()

enter image description here

IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • 2
    Thank you for your answer. I have considered a mosaic plot, but I just don't think it is as clear as bar plots, though you are right in saying that they save more space. – Bram Vanroy Jan 09 '15 at 09:10