0

I have been dealing with a problem that is like a mystery to me. I can the following plot with the code below:

# Custom y-axis breaks ~
breaks_fun <- function(x){
  if (min(x) < 1){
    seq(1, 4)}
  else if (min(x) < 83){
    seq(85, 95, by = 5)}
  else if (min(x) < 100){
    seq(96, 99, by = 1)}
  else { 
    seq(0, 40000000, by = 5000000)}}


# Custom y-axis labels ~
plot_index_labels <- 0
labels_fun <- function(x) {
  plot_index_labels <<- plot_index_labels + 1L
  switch(plot_index_labels,
         scales::label_number(accuracy = 0.1, suffix = "X")(x),
         scales::label_percent(accuracy = 1, scale = 1, big.mark = "")(x),
         scales::label_percent(accuracy = 1, scale = 1, big.mark = "")(x),
         scales::label_number(accuracy = 1, big.mark = "", suffix = "M")(x))}


# Creates the panel ~
BSG_Combined <- 
 ggplot() +
  geom_violin(data = fulldfUp, aes(x = Species, y = Value),
              fill = "#ffffff", colour = "#000000", show.legend = FALSE, alpha = .9, size = .3, width = .7) +
  stat_summary(data = fulldfUp, aes(x = Species, y = Value),  
               fun = mean, geom = "point", shape = 21, size = 3.5, alpha = .9, colour = "#000000", fill = "#000000") +
  facet_grid(Category ~. , scales = "free", labeller = labeller(Category = ylabels)) +
  scale_y_continuous(breaks = breaks_fun, labels = labels_fun) +
  theme(panel.background = element_rect(fill = "#ffffff"),
        panel.grid.major = element_line(color = "#ededed", linetype = "dashed", size = .00005),
        panel.grid.minor = element_blank(), 
        panel.border = element_blank(),
        panel.spacing.y = unit(1, "cm"),
        axis.line = element_line(colour = "#000000", size = .3),
        axis.title = element_blank(),
        axis.text.x = element_text(colour = "#000000", size = 16, face = "bold", angle = 45, vjust = 1, hjust = 1),
        axis.text.y = element_text(color = "#000000", size = 16, face = "bold"),
        axis.ticks.x = element_line(color = "#000000", size = .3),
        axis.ticks.y = element_line(color = "#000000", size = .3),
        strip.background.y = element_rect(colour = "#000000", fill = "#d6d6d6", size = 0.3),
        strip.text = element_text(colour = "#000000", size = 20, face = "bold"),
        legend.position = "top",
        legend.margin = margin(t = 0, b = 0, r = 0, l = 0),
        legend.box.margin = margin(t = 10, b = 20, r = 0, l = 0),
        legend.key = element_rect(fill = NA),
        legend.background = element_blank())

enter image description here

So, as you can see the breanks_fun function works for all the facets but for the last one. I have tried to change the function in countless ways -- including adding a final else if [e.g. (min(x) > 1000)] but no combination works. I can also do it without the labels = labels_fun function and the result is the same. I have managed to use this very same function on another data, but it seems that something is going off when I try to apply it on this new data.

Some info:

> summary(TotalReads$Value)
    Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
  576504  6209678 10267860 10715209 14395940 34754853

So the only difference between this data and my previous data is that here the discrepancy between the min and max values is bigger. However, I do not see how this could affect the breanks_fun function since these samples are so different from the ones present in the other 3 categories, hence the else should be more than enough anyway.

Would anyone be able to spot something that I have been missing? I would very much appreciate any help.

Best regards, George.

stefan
  • 90,330
  • 6
  • 25
  • 51
  • It would be eaiser to help you if you provide [a minimal reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) including a snippet of your data or some fake data. Also, as your theme adjustments are not important for your issue I would suggest to remove them. Makes it easier to focus on the core of your issue. – stefan Feb 04 '22 at 21:29

1 Answers1

1

The crux of the matter is that breaks_fun is passed the limits of the plot, not the data, which are by default expanded. See the expand argument in the help for scale_y_continuous

"For position scales, a vector of range expansion constants used to add some padding around the data to ensure that they are placed some distance away from the axes. Use the convenience function expansion() to generate the values for the expand argument. The defaults are to expand the scale by 5% on each side for continuous variables, and by 0.6 units on each side for discrete variables."

For the last data set, this makes the bottom limit negative. Consequently, the breaks are created at seq(1,4). Add a print statement to breaks_fun to see this. (I just used your summary data as the data set which did replicate the behavior)

smpData <- c(576504,  6209678, 10267860, 10715209, 14395940, 34754853)
smpDf <- data.frame(
  Value = rep(smpData, 5),
  Species = rep(LETTERS[1:5], each = length(smpData))
)


breaks_fun <- function(x){
  print(x)
  ....<other code>...

#>[1] -1132413 36463770

Seems like you have two options. One, instead of using min to set the breaks, use max. breaks_fun becomes (linted it a bit to my coding style):

breaks_fun <- function(x){
  caseVal <- max(x)

  if (caseVal < 1){
    seq(1, 4)
  } else if (caseVal < 83){
    seq(85, 95, by = 5)
  } else if (caseVal < 100){
    seq(96, 99, by = 1)
  } else {
    seq(0, 40000000, by = 5000000)
  }
}

Alternatively, you can set the expand argument, but you plots will rest on the x-axis.

scale_y_continuous(
    breaks = breaks_fun, labels = labels_fun, expand = c(0, 0)
  )

You can also combine this with setting the limits argument explicitly to give you some spacing, but make sure to also adjust expand so that the limits stay positive/satisfy your switch statement in breaks_fun.

Marcus
  • 3,478
  • 1
  • 7
  • 16
  • Thanks very much, @Marcus! I did try to use `expand`, but it did not look super nice as you anticipated. Thus, I changed the `breaks_fun` function as you indicated and now it works! I had no idea that the function was going through the limits and not the data! Thanks a lot! # Custom y-axis breaks ~ breaks_fun <- function(x){ caseVal <- max(x) print(x) if (caseVal < 6){ seq(1, 4, by = 1)} else if (caseVal < 99.1){ seq(96, 99, by = 1)} else if (caseVal < 99.3){ seq(85, 100, by = 5)} else { seq(5000000, 35000000, by = 10000000)}} – George Pacheco Feb 05 '22 at 00:03