0

I would like to fix the secondary axis from 0 to 1 since the probability is always range from 0 to 1. However, based on the data, if the maximum return (y) is high (as shown in the diagram below, 6%), then the scale of secondary y-axis will exceed 1 (since it depends on primary y-axis), which is not an ideal situation for presentation. How should I limit the maximum of scale for secondary y-axis?

enter image description here

I heard that ggplot still not allow us to limit the scale for secondary axis, but I'm not sure whether is true or not. Is there any other ways to limit the scale for secondary y-axis so that the maximum probability show on the graph will always be 1 or less than 1 depends on the maximum return. Please note that the y shown in the code is the return.

y <- c(0.01, -0.005, 0.06)
Month <- c("Jan", "Feb", "Mar")

dtf <- data.frame(Month, y)

require(reshape)
dtf2 <- melt(dtf)
dtf2[["sign"]] = ifelse(dtf2[["value"]] >= 0, "positive", "negative")
Probability <- c(0.4,0.22,0.54)

dtf2 <- data.frame(dtf2, Probability)

dtf2 %>>% ggplot() + 
  geom_bar(mapping = aes(x=Month, y=value, fill = sign), stat ="identity", width = 0.75)+
  geom_point(mapping = aes(x=Month, y=Probability/20-0.025), size = 2.5, color="blue")+
  geom_line(mapping = aes(x=Month, y=Probability/20-0.025, group = 1), color = "blue", size = 1)+
  geom_hline(yintercept=0)+
  geom_text(aes(x=Month, y=value, label = paste(y * 100,"%"), vjust = ifelse(y >= 0, -0.5, 1.2)), hjust = 0.5)+
  ylab("\nReturn")+
  theme(axis.text.x = element_text(face = "bold", size=11),
        axis.text.y = element_text(face = "bold"),
        axis.title.x=element_blank(),
        legend.position = "none",
        plot.title = element_text(hjust = 0.5),
        plot.subtitle = element_text(hjust = 0.5))+
  theme(panel.background = element_blank(),
        axis.ticks.x = element_blank()) +
  theme(axis.line.y = element_line(color="black", size = 0.5))+
  scale_y_continuous(labels=scales::percent, sec.axis = sec_axis(~(. + 0.025)*20, name = "Probability\n" ))+
  scale_fill_manual(values = c("positive" = "black", "negative" = "red"))

I hope the maximum scale for secondary y-axis is always 1 or less than 1 if the returns are lower . Please give some advise, thanks!

Elvis
  • 405
  • 1
  • 4
  • 13
  • 3
    This kind of graph is exactly the reason why secondary axes are so difficult in ggplot2. Hadley (rightly) does not want to support this. – Roland Aug 14 '19 at 07:16
  • Do you have any suggestion how should I plot exactly this graph by using other function in R instead of gglot? – Elvis Aug 14 '19 at 07:26
  • 1
    My personal opinion is that you should not create this graph. Thus, I won't make the effort to show you how to do this with base graphics. Maybe someone else would be willing. – Roland Aug 14 '19 at 07:32
  • 1
    I totally agree with @Roland. Maybe to illustrate this further: How is one suppose to interpret the bar with 6% which exceeds the 100% on the probability scale? What exactly is the connection between the left scale and the right one? How does the reader know which scale to pick when? I suggest to have a look here: https://stackoverflow.com/a/3101876/5892059 – kath Aug 14 '19 at 07:36
  • 1
    The probability shown at the right scale is actually the probability of getting positive returns. So in this case, the probability of getting positive return in March is 0.54, and the probability of getting positive return in February is 0.4. The plot is required by client, they want me to combine two plot, so I have no choice. Of course, the actual data is not like this so maybe doesn't make sense. If there is any suggestion would be appreciated – Elvis Aug 14 '19 at 07:44

1 Answers1

1
library(tidyverse)
library(rashape2)
y <- c(0.01, -0.05, 0.06)
Month <- c("Jan", "Feb", "Mar")
dtf <- data.frame(Month, y) 
require(reshape)
dtf2 <- melt(dtf)
dtf2[["sign"]] = ifelse(dtf2[["value"]] >= 0, "positive", "negative")
Probability <- c(0.4,0.22,0.54)
yrange <-max(y)- min(y) 
ymin <- min(y)
dtf2 <- data.frame(dtf2, Probability) 

dtf2 %>%  ggplot() + 
  geom_bar(mapping = aes(x=Month, y=value, fill = sign), stat ="identity", width  = 0.75)+
  geom_point(mapping = aes(x=Month, y=Probability*yrange+ymin), size = 2.5,  color="blue") +
  geom_line(mapping = aes(x=Month, y=Probability*yrange+ymin, group = 1), color = "blue", size = 1)+
  geom_hline(yintercept=0)+
  geom_text(aes(x=Month, y=value, label = paste(y * 100,"%"), vjust = ifelse(y >= 0, -0.5, 1.2)), hjust = 0.5)+
  ylab("\nReturn")+
  theme(axis.text.x = element_text(face = "bold", size=11), axis.text.y = element_text(face = "bold"),
    axis.title.x=element_blank(),
    legend.position = "none",
    plot.title = element_text(hjust = 0.5),
    plot.subtitle = element_text(hjust = 0.5))+
  theme(panel.background = element_blank(),
    axis.ticks.x = element_blank()) +
  theme(axis.line.y = element_line(color="black", size = 0.5))+
  scale_y_continuous(labels=scales::percent, sec.axis = sec_axis(~(. -ymin)/yrange, name = "Probability\n", breaks = c(0,0.5,1)))+
  scale_fill_manual(values = c("positive" = "black", "negative" = "red"))

enter image description here

Zhiqiang Wang
  • 6,206
  • 2
  • 13
  • 27
  • Hi Wang Zhi Qiang, thanks for your reply. I tried your modified code already, yes, it can make the probability start from 0 to 1. However, if there is a very negative return e.g. -5%, the scale for the probability will have negative as well. I'm due with with different set of data so the expected output will be more flexible. – Elvis Aug 14 '19 at 08:13
  • Thanks Zhi Qiang, it works perfectly for me! Thanks for your help! :) – Elvis Aug 15 '19 at 02:28