0

I am working on ggplot2 to reproduce a double y axis plot. The basic dataframe I am working on is s4 (I will include dput version in the final side). My dataframe is composed of an id variable and two continuous variables x and y. The code I use for the plot is the next:

library(dplyr)
library(ggplot2)
library(tidyr)
#Transform
DF_long <- s4 %>%pivot_longer(names_to = "y_new", values_to = "val", x:y)
#Plot
ggplot(DF_long, aes(x=id)) +
  geom_bar( aes(y = val, fill = y_new, group = y_new),
            stat="identity", position=position_dodge(),alpha=.6)  +
  scale_fill_manual(values = c("blue", "red")) +
  scale_y_continuous(sec.axis = sec_axis(~.*0.1))+
  theme(axis.text.x = element_text(size=7,color='black',face='bold',angle = 90),
        axis.text.y = element_text(size=7,color='black',face='bold'),
        plot.title = element_text(hjust = 0.5,size=14,face="bold"),
        axis.title=element_text(size=10,face="bold"),
        strip.text.x = element_text(size = 8, face = "bold"),
        legend.position = "top",legend.title = element_blank(),panel.grid = element_blank(),
        legend.text = element_text(face='bold'),
        axis.title.x = element_blank()) 

The code works but it does not produce my desired output. I got this plot: enter image description here

My issue is that I can not visualize the x variable due to the limits of the both y-axis. I would like to be able to see both variables. Please could you help me to adjust my plot in my code. The dput version of s4 is next:

s4 <- structure(list(id = c("s1", "s2", "s3", "s4", "s5", "s6", "s7", 
"s8", "s9", "s10", "s11", "s12", "s13", "s14", "s15", "s16", 
"s17", "s18", "s19", "s20", "s21", "s22", "s23", "s24"), x = c(405L, 
409L, 257L, 306L, 509L, 103L, 100L, 118L, 41L, 231L, 93L, 255L, 
49L, 132L, 305L, 145L, 57L, 124L, 73L, 46L, 115L, 108L, 45L, 
26L), y = c(48148371.54, 35373940.7, 5256435.59, 5155308.9, 4155030.89, 
3792519.09, 2468987.02, 2264228.41, 2016421.67, 2001806.46, 1971658.78, 
1531488.5, 1358481.17, 1331466.48, 1072746.35, 992129.81, 954277.63, 
846098.66, 810819.33, 635270.45, 383283.61, 345273.12, 290598.09, 
265288.75)), row.names = c(NA, -24L), class = c("tbl_df", "tbl", 
"data.frame"))

Many thanks for your help.

Duck
  • 39,058
  • 13
  • 42
  • 84
  • What are the values of the second y axis referring to? plotting with two y axis is generally not considered a good thing: https://stackoverflow.com/questions/3099219/ggplot-with-2-y-axes-on-each-side-and-different-scales – Peter May 22 '20 at 23:21
  • @Peter Thanks for your help. `x` is number of items and `y` is amount in dollars. That is why I need two axis. – Duck May 22 '20 at 23:23
  • @Peter Oh thanks, it is what I need. Just I would like to know if there is any way to determine the scaling factor? I have trouble on that. I will accept your answer. – Duck May 23 '20 at 12:29
  • @Peter Yes please that would be pretty useful. – Duck May 23 '20 at 12:33
  • Answer updated now – Peter May 23 '20 at 12:58

1 Answers1

0

Your comments make the question clearer; I've updated the answer accordingly.

Using dual scaled y-axis especially with bars is generally considered inappropriate and should be discouraged. (Paraphrased from the Few article noted below)

See this for discussion on the issue: How can I plot with 2 different y-axes?, ggplot with 2 y axes on each side and different scales and the linked article by Stephen Few: http://www.perceptualedge.com/articles/visual_business_intelligence/dual-scaled_axes.pdf

library(ggplot2)
library(dplyr)
library(tidyr)

Determine a scaling factor to transform your y2 data to "map" to the y1 data

There may be a progamatic way to do this but I've done it by looking at the ranges of the two variable and a bit of trial and error to see what works on the graph. Check the ratios between the two y variables. Your y2_factor is likely to be somewhere between these ratios. But you need to play around with what looks best in the graph. I've opted for 90,000 which is closer to the ratio of the maximums of the variables.


range(s4$y)/range(s4$x)
[1] 10203.41 94594.05

y2_factor <- 90000



Tranform your original x values;

I've done this by creating a separate variable so there is no confusion between the value used for plotting and the true values

DF_long <- 
  s4  %>% 
  mutate(x1 = x * y2_factor) %>% 
  pivot_longer(names_to = "y_new", values_to = "val", c(x1, y))

Plot

ggplot(DF_long, aes(x=id)) +
  geom_bar(aes(y = val, fill = y_new, group = y_new),
            stat="identity", position=position_dodge(),alpha=.6)  +
  scale_fill_manual(values = c("blue", "red"), labels = c("Number of items", "Dollars")) +
  # To get meaningful lables you  need invert the transformation process on the secondary axis by dividing values by the transforming factor
  # You can adjust the breaks and axis title to suit... 
  scale_y_continuous(sec.axis = sec_axis(~. / y2_factor, breaks = seq(0, 600, by = 50), name = "Number of items" ))+
  theme(axis.text.x = element_text(size=7,color='black',face='bold',angle = 90),
        axis.text.y = element_text(size=7,color='black',face='bold'),
        plot.title = element_text(hjust = 0.5,size=14,face="bold"),
        axis.title=element_text(size=10,face="bold"),
        strip.text.x = element_text(size = 8, face = "bold"),
        legend.position = "top",legend.title = element_blank(),panel.grid = element_blank(),
        legend.text = element_text(face='bold'),
        axis.title.x = element_blank()) 

You end up with this:




<sup>Created on 2020-05-23 by the [reprex package](https://reprex.tidyverse.org) (v0.3.0)</sup>



Peter
  • 11,500
  • 5
  • 21
  • 31