3

In the artificial data I have created for the MWE below I have tried to demonstrate the essence of a script I have created in R. As can be seen by the graph that gets produced from this code, on one of my conditions I don't have a "No" value to complete the series.

I have been told that unless I can make this last column that sadly doesn't have the extra series as thin as the columns else where in the graph I won't be permitted to use these graphs. This is sadly a problem because the script I have written produces hundreds of graphs simultaneously, complete with stats, significance indicators, propogated error bars, and intelligent y-axis adjustments (these features are of course not present in the MWE).

Few other comments:

  • This exception column is not guaranteed to be at the end of the graph... so manual tweaking to force the series to change color and invert the order leaving the extra space on the right hand side isn't reliable.

  • I have tried to simulate the data as a constant 0 so that the series "is present" but invisible, but as would be expected, the order of the series c(No,Yes) makes this skip a space which is also unacceptable. This is how this same question was answered here, but sadly it doesn't work for me with my restrictions: Consistent width for geom_bar in the event of missing data and Include space for missing factor level used in fill aesthetics in geom_boxplot

  • I also tried to do this with facets but numerous issues arose there including line breaks, and errors in the annotations I add to the x-axis.

MWE:

library(ggplot2)

print("Program started")

x <- c("1","2","3","1","2","3","4")
s <- c("No","No","No","Yes","Yes","Yes","Yes")
y <- c(1,2,3,2,3,4,5)
df <- as.data.frame(cbind(x,s,y))

print(df)

gg <- ggplot(data = df, aes_string(x="x", y="y", weight="y", ymin=paste0("y"), ymax=paste0("y"), fill="s"));
dodge_str <- position_dodge(width = NULL, height = NULL);
gg <- gg + geom_bar(position=dodge_str, stat="identity", size=.3, colour = "black")

print(gg)

print("Program complete - a graph should be visible.")
Community
  • 1
  • 1
EngBIRD
  • 1,915
  • 3
  • 18
  • 22
  • @DavidRobinson I believe the reason skipping space (aesthetically) is not valid comes down to the same reason why the x-axis spacing consistency vs column width consistency is invalid. It comes down to my boss's aesthetic preference which I have no power to argue. I appreciate your rapid comment, but from the perspective of "the value is possible", this actually is not a correct statement because of the nature of the factors behind this condition. From a technical setup it may actually be impossible to setup this condition without the series identified stimulant. – EngBIRD Mar 06 '15 at 00:25
  • Do you have the same problem as described [here](http://stackoverflow.com/questions/11020437/consistent-width-for-geom-bar-in-the-event-of-missing-data)? Does the solution help? – tonytonov Mar 06 '15 at 14:42
  • @tonytonov Thanks for your response, but as per the second bullet in my list of comments, the extra space that this approach generates is sadly worse in my boss's opinion. I think the big difference in my case, is our data has significantly less columns than this post so the extra space is really pronounced. – EngBIRD Mar 06 '15 at 15:48
  • Ah, sorry. Yes, I can confirm that the order breaks on your data, whereas it seemingly shouldn't. In the solution above `a` is missing, and its' place is preserved, but here `No` is missing, and its' place is shifted. Strange, I cannot say why that happens. – tonytonov Mar 06 '15 at 16:08

2 Answers2

1

Yeah, I figured what happened: you need to be extra careful about factors being factors and numerics being numerics. In my case, with stringsAsFactors = FALSE I have

str(df)
'data.frame':   7 obs. of  3 variables:
 $ x: chr  "1" "2" "3" "1" ...
 $ s: chr  "No" "No" "No" "Yes" ...
 $ y: chr  "1" "2" "3" "2" ...

dput(df)
structure(list(x = c("1", "2", "3", "1", "2", "3", "4"), s = c("No", 
"No", "No", "Yes", "Yes", "Yes", "Yes"), y = c("1", "2", "3", 
"2", "3", "4", "5")), .Names = c("x", "s", "y"), row.names = c(NA, 
-7L), class = "data.frame")

with no factors and numeric turned into character because of cbind-ing (sic!). Let us have another data frame:

dff <- data.frame(x = factor(df$x), s = factor(df$s), y = as.numeric(df$y))

Adding a "dummy" row (manually for your example, check out expand.grid version in the linked question on how to do this automatically):

dff <- rbind(dff, c(4, "No", NA))

Plotting (I removed extra aes):

ggplot(data = df3, aes(x, y, fill=s)) + 
  geom_bar(position=dodge_str, stat="identity", size=.3, colour="black")

enter image description here

tonytonov
  • 25,060
  • 16
  • 82
  • 98
  • The only reason why I have numbers in quotes on my x-axis is I was in too big a rush to post the question than to figure out alphabetical strings (and overlooked completely a, b, c etc.) Thanks for your answer, I learned something about factors and cbind because of it, but until I find a way to remove that extra blank space I am sunk. – EngBIRD Mar 06 '15 at 16:32
  • Do I understand correctly that you'd like to keep the fourth column width as in my graph above, but padded to the left, closer to the third? Wouldn't that force '4' on the scale to become closer to '3' and make the scale uneven? – tonytonov Mar 06 '15 at 16:37
  • Yes absolutely - thin and padded. Because my graph in my actual script has no gridlines and full strings for x-axis labels my boss wants homogeneity for the visual appearance of the bars and spaces, label homogeneity is less important. – EngBIRD Mar 06 '15 at 16:40
  • I see. That's tricky, I'll have to think a little bit. Probably involves some hacking with `scale_x_...`. – tonytonov Mar 06 '15 at 16:50
  • In the meantime, check out a somewhat similar trick I'm showing [here](http://stackoverflow.com/questions/28363933/ggplot-how-to-change-boxplot-settings-when-stat-summary-is-used/28365282#28365282). – tonytonov Mar 06 '15 at 16:53
  • I think I will use your solution here - i.e. force the space, and I will add a text annotation N.D. So, I hope this isn't too off topic - if it is I will open up a new question, but in a case where factors aren't explicitly used in the data frame (and you get the stretching vs spacing). Is there a way I can detect or force the addition of this space? My data is imported into a data.frame from a sql import, and subset using `[ ]` and my string x axis is treated with a `factor(x-axis)` command – EngBIRD Mar 06 '15 at 17:01
  • Difficult to say offhand. I recommend asking a separate question, linking back here. – tonytonov Mar 06 '15 at 17:05
0

At the expense of doing your own calculation for the x coordinates of the bars as shown below, you can get a chart which may be close to what you're looking for.

x <- c("1","2","3","1","2","3","4")
s <- c("No","No","No","Yes","Yes","Yes","Yes")
y <- c(1,2,3,2,3,4,5)
df <- data.frame(cbind(x,s,y) )
df$x_pos[order(df$x, df$s)] <- 1:nrow(df)
x_stats <- as.data.frame.table(table(df$x), responseName="x_counts")
x_stats$center <- tapply(df$x_pos, df$x, mean)
df <-  merge(df, x_stats, by.x="x", by.y="Var1", all=TRUE)
bar_width <- .7
df$pos <- apply(df, 1, function(x) {xpos=as.numeric(x[4]) 
                                if(x[5] == 1) xpos 
                                else ifelse(x[2]=="No", xpos + .5 -        bar_width/2, xpos - .5 + bar_width/2) } )
 print(df)
gg <- ggplot(data=df, aes(x=pos, y=y, fill=s ) )
gg <- gg + geom_bar(position="identity", stat="identity", size=.3,    colour="black", width=bar_width)
gg <- gg + scale_x_continuous(breaks=df$center,labels=df$x )
plot(gg)

----- edit --------------------------------------------------

Modified to place the labels at the center of bars.

Gives the following chart

enter image description here

WaltS
  • 5,410
  • 2
  • 18
  • 24
  • Neat answer, can this be done without duplicating the x-axis labels? – EngBIRD Mar 06 '15 at 17:51
  • Your code needs some formatting in order to improve its readability. It's good that the OP got their answer, but for subsequent viewers, cleaner formatting may help others as well. – Danny Bullis Jul 26 '21 at 03:10