2

I'm using R to create my graphics for the Walker alias tables I am using in my thesis. I have managed to produce every graph using ggplot2, except for the last one where the alias values are allocated so the probability in each column equals 1.

The graph with the probabilities scaled prior to creating the aliases is:

foo <- data.frame(Buscount=c(1,2,3,4,5), Rescaled.busfreq= c(5/9, 10/9, 15/9, 10/9, 5/9))
ggplot(foo, aes(x=factor(Buscount),y=Rescaled.busfreq, fill=factor(Buscount))) +
geom_bar(stat="identity", width=1) + 
scale_fill_manual(values=c("cyan","magenta2","gold","gray","darkolivegreen3", "black")) +
scale_x_discrete(labels=c("a-2", "a-1", "a", "a+1", "a+2"), expand=c(0,0), name="Real count") +
scale_y_continuous(breaks=seq(0,15/9, by=3/9),labels=c("0", "3/9","6/9","9/9", "12/9", "15/9"), expand=c(0,0),
                 name="Adjusted probability of count") +
geom_rect(data=NULL, aes(xmin = 0.5, xmax = 5.5, ymin = 0, ymax = 9/9), color="black", fill=NA, size=1.5) +
geom_vline(xintercept=c(1.5, 2.5, 3.5, 4.5), color="gray") +
theme(panel.grid.minor.y=element_blank(), 
    panel.grid.major.y=element_line(color="gray"),
    panel.background=element_blank(), legend.position="none",
    axis.line = element_line(color="gray", size = 1))

This produces the desired output: Walker alias table before alias adjustment

I thought a stacked bar graph in ggplot2 would be the most convenient method of fitting the values into the 1 x 5 plane, but I can't get the stacked bar graph to work. This is the code I have ended up with after a number of attempts, and I have constructed a new data.frame as the lengths exceed those in the original data.frame. In order not to repeat the Columns data in the Values data, the Values data has substituted A for a-2, B for a-1 and so forth. The 0's are there as fillers so that exactly five probabilities contribute to each Columns value.

Final.Buscount.Alias <- data.frame(Values=rep(c("A","B", "C", "D", "E"), times=5))
Final.Buscount.Alias$Probabilities <- c(5/9,4/9,0,0,0, 0, 6/9, 0, 3/9,0, 0,0,9/9,0,0, 0,0,2/9,7/9,0, 0,0,4/9,0,5/9)
Final.Buscount.Alias$Columns <- rep(c("a-2","a-1", "a", "a+1", "a+2"), each=5)
ggplot(Final.Buscount.Alias, aes(x=factor(Columns),y=Probabilities, fill=factor(Values))) +
geom_bar(stat="identity", width=1) + 
scale_fill_manual(values=c("cyan","magenta2","gold","gray","darkolivegreen3", "black")) +
scale_x_discrete(labels=c("a-2", "a-1", "a", "a+1", "a+2"), expand=c(0,0), name="Real count") +
scale_y_continuous(breaks=seq(0,15/9, by=3/9),labels=c("0", "3/9","6/9","9/9", "12/9", "15/9"), expand=c(0,0),
                 name="Probabilities including alias") +
geom_rect(data=NULL, aes(xmin = 0.5, xmax = 5.5, ymin = 0, ymax = 9/9), color="black", fill=NA, size=1.5) +
geom_vline(xintercept=c(1.5, 2.5, 3.5, 4.5), color="gray") +
theme(panel.grid.minor.y=element_blank(), 
    panel.grid.major.y=element_line(color="gray"),
    panel.background=element_blank(), legend.position="none",
    axis.line = element_line(color="gray", size = 1))

This produces the graphIncorrect Walker alias graph

But the colours appear to be correct, but there are some problems. The bar for a-1 is the only correct one. The bar at a-2 should be at a, the bar at a should be at a-2. a+1 and a+2 are almost correct, although - strictly speaking - the order of the bars within the columns should be reversed. The graph I am trying to create is one I produced manually in Excel:

Correct Walker alias table

There seems to be an ordering inside ggplot2 that I don't understand.

I've read some solutions to stacked bar graphs here, here, here, here, and here, but I can't work out what I am doing wrong.

Michelle
  • 1,281
  • 2
  • 16
  • 31

1 Answers1

2

I think the key issue you're having relates to how to set the order for factor variables in R. Doing factor(Columns) or factor(Values) converts those columns to factors, but the ordering is alphabetical by default. (To get a different ordering, you need to explicitly set the ordering with the levels argument, as discussed below.) That means that factor(Columns) sets the order to a, a-1, a-2, a+1, a+2. scale_x_discrete just relabels the x axis, but doesn't change the underlying data. That's why the leftmost column looked like column a (because it still was the data in a) but was relabeled to a-2.

The way to get the ordering you want is to use the factor function but to explicitly specify the ordering using the levels argument. In this case, we want the order of Columns to go from a-2 to a+2. To get the stacked bars in the correct order, we need B to come before A, and D to come before B. But then we also need to move C so that it continues to come before D. So, the final ordering for Values is C,D,B,A,E, which we can type in directly c("C","D","B","A","E") or code with the built-in LETTERS vector: LETTERS[c(3,4,2,1,5)]. I've set up your data with the correct orderings below.

I don't know if you want a legend, but in case you do: By default, the legend will be ordered based on the factor order. But because the Values are letters, you might want them ordered in alphabetical order. If so, set breaks=LETTERS[1:5] in scale_fill_manual (which I've done below). This changes the order in the legend, without changing the factor order in the plot.

In addition, I've labeled the color vector in scale_fill_manual to ensure that the desired colors are assigned to each level of Values (I've left "black" in there, but it's not used in the plot as specified). I made a few other coding changes a well: For example, geom_col instead of geom_bar to avoid the need for stat="identity"; removed geom_rect and instead used theme to set a wider panel.border.

library(ggplot2)

Final.Buscount.Alias <- data.frame(Values=rep(c("A","B", "C", "D", "E"), times=5))
Final.Buscount.Alias$Values = factor(Final.Buscount.Alias$Values, 
                                     levels=LETTERS[c(3,4,2,1,5)])

Final.Buscount.Alias$Probabilities <- c(5/9,4/9,0,0,0, 0, 6/9, 0, 3/9,0, 0,0,9/9,0,0, 0,0,2/9,7/9,0, 0,0,4/9,0,5/9)

Final.Buscount.Alias$Columns <- rep(c("a-2","a-1", "a", "a+1", "a+2"), each=5)
Final.Buscount.Alias$Columns = factor(Final.Buscount.Alias$Columns, 
                                      levels=unique(Final.Buscount.Alias$Columns))

ggplot(Final.Buscount.Alias, aes(x=Columns, y=Probabilities, fill=Values)) +
  geom_col(width=1) + 
  scale_fill_manual(values=c(A="cyan",B="magenta2",C="gold",D="gray",E="darkolivegreen3", "black"), breaks=LETTERS[1:5]) +
  scale_x_discrete(expand=c(0,0)) +
  scale_y_continuous(breaks=seq(0, 15/9, by=3/9),
                     labels=c("0", paste0(seq(3,15,3),"/9")), 
                     expand=c(0,0)) +
  geom_vline(xintercept=c(1.5, 2.5, 3.5, 4.5), color="gray30") + # Darkened this to make it obvious where the lines are. Remove this line of code if you want the colors to abut each other.
  labs(x="Real Count", y="Probabilities including alias") +
  theme(panel.border=element_rect(size=2, fill=NA))

enter image description here

eipi10
  • 91,525
  • 24
  • 209
  • 285
  • This is weird, I've updated to ggplot2 version 2.2 - including the dependencies - and I'm getting the error `Error: could not find function "geom_col"`. I tried updating all the other packages I use, that had updates, and I still get that error. – Michelle Mar 13 '18 at 04:12
  • Note sure why that's happening. Just switch back to `geom_bar`. – eipi10 Mar 13 '18 at 04:13
  • I found the problem, RStudio now sneakily doesn't update from CRAN, and even though my documentation says 2.2.0, when I did packageVersion ("ggplot2") it says 2.0.0. – Michelle Mar 13 '18 at 04:23
  • Yes, thanks for catching that. I must have deleted it accidentally when I updated my answer. I've fixed it now. – eipi10 Mar 13 '18 at 06:11
  • Ah, just worked out that the order in levels for the Values variable is descending order by height of bar inside the alias table. That took me a little while to figure out. :) – Michelle Apr 04 '18 at 04:48