Secondary y-axis in ggplot in R?

Question

Went through a few of secondary axis solutions proposed here but didn't get it right. I am trying to plot Elevation on the left y-axis and FlowA & Flowb on the right y-axis. My sample code will do the Elevation plotting however, struggling to get the FlowA & FlowB variables on the secondary axis. Any help would be appreciated.

library(lubridate)
library(tidyverse)

set.seed(123)

FakeData <- data.frame(Date = seq(as.Date("2001-01-01"), to= as.Date("2001-12-31"), by="day"),
                 Elevation = runif(365, 806.8,807.8),
                 FlowA = runif(365,8,15),
                 FlowB = runif(365,1,3))
ggplot(FakeData, aes(x = Date, y = Elevation))+
  geom_line()

score 1 · Accepted Answer · answered Aug 22 '20 at 20:30

1

I would suggest next approach. Also to mention that the output would depend on your data. Here all Elevation values are close to 800. For second axis, you have to define a scaling factor around all variables so that they are properly showed. Next the code:

library(lubridate)
library(tidyverse)

set.seed(123)
#Data
FakeData <- data.frame(Date = seq(as.Date("2001-01-01"), to= as.Date("2001-12-31"), by="day"),
                       Elevation = runif(365, 806.8,807.8),
                       FlowA = runif(365,8,15),
                       FlowB = runif(365,1,3))
#Scale factor
scalefactor <- max(FakeData$Elevation)/max(max(FakeData$FlowA),max(FakeData$FlowB))
#Plot
ggplot(FakeData, aes(x = Date))+
  geom_line(aes(y = Elevation,group=1,color='Elevation'),show.legend = T)+
  geom_line(aes(y = FlowA*scalefactor, color = 'FlowA'))+
  geom_line(aes(y = FlowB*scalefactor, color = 'FlowB'))+
  scale_y_continuous(sec.axis = sec_axis(~./scalefactor, name = 'Flow A and Flow B'))

The output:

answered Aug 22 '20 at 20:30

Duck

39,058
13
42
84

really like your solution- many thanks. There is just one problem. In my actual data, i have a `variable` that has some missing data `(i.e. NA)`- when using `scale_y_continuous(sec.axis = sec_axis(~./scalefactor, name = 'Flow A and Flow B'))`. The `variable` that has `NA` would not get `re-scale`. Is there anyway to address this issue? – Hydro Aug 23 '20 at 01:47
@Hydro Depeding on how many `NA` you have, you could apply imputation for missing data in order to fill those values or maybe a spline method to complete those values. I hope that could help you :) – Duck Aug 23 '20 at 12:24
@Hydro Also if you have `NA` please use this for scale factor `scalefactor <- max(FakeData$Elevation,na.rm=T)/max(max(FakeData$FlowA,na.rm=T),max(FakeData$FlowB,na.rm=T))` Technically, `NA` will not be plotted in your graph but no error will be returned in your code. – Duck Aug 23 '20 at 12:35

score 0 · Answer 2 · answered Aug 22 '20 at 17:40

The solution in here works just fine for me. But I would like to add some adjustments because the code will be slightly different for your data. According to the solution, we will use 3 geom objects that represent the elevation, FlowA, and FlowB. We will also make the secondary axis for FlowA and FlowB.

ggplot(FakeData, aes(x = Date))+
  geom_line(aes(y = Elevation)) +
  geom_col(aes(y = FlowA), fill="blue") +
  geom_col(aes(y = FlowB), fill='red')+
  scale_y_log10(sec.axis = sec_axis(~ .*1, labels = scales::number_format(scale=1/10),name="Flow"))

In the code above, I will show the elevation as a line plot and flows as a bar plot. Why did I use the logarithmic scale here? Because the distance of the elevation's value range(between 806.8 until 807.8) and the flows' value range is very far. If you proceed with a regular y-axis (scale_y_continuous()), you will have this plot below: See that the plot is not so meaningful. You can't see clearly how the flows change over time. Here's what it looks in logarithmic scale: I use the logarithmic scale for the left y-axis and regular scale on the right y-axis. Now we can see clearly the changes in the flows over time. The Elevation will definitely be a straight line because you set it to be a random uniform distribution.

However, personally, I don't suggest you use a double y-axis because it confuses the plot user. I suggest you split the plot into two different plots.

Many thanks @Mathew, your solution has some good points- Not what I am looking for though. — Hydro, Aug 23 '20 at 03:03
There is also the danger of using `geom_col()` in this way because it defaults to `position = "stack"`, which gives a distorted reflection of the data in log scales. — teunbrand, Aug 24 '20 at 13:05

Secondary y-axis in ggplot in R?

2 Answers2

Linked