1

I have a df:

 Sample_ID = c("LSL Guideline", "USL Guideline", "P1014B", "P1014F", "P1036A", "P1036B", P1036C","P1036D" ,"P1036E, "P1036F")
    CONTAMINATION_SCORE (NA) = c(0, 3106, 2677, 1021, 870, 6831, 1324, 4175, 1370,  875)
    CONTAMINATION_P_VALUE (NA) = c(0.000, 0.049, 0.101, 1.000, 1.000, 0.000, 1.000, 0.036, 1.000, 1.000)
    df <- data.frame(Sample_ID, CONTAMINATION_SCORE (NA), CONTAMINATION_P_VALUE (NA) )

> df
       Sample_ID CONTAMINATION_SCORE..NA. CONTAMINATION_P_VALUE..NA.
1  LSL Guideline                        0                      0.000
2  USL Guideline                     3106                      0.049
3         P1014B                     2677                      0.101
4         P1014F                     1021                      1.000
5         P1036A                      870                      1.000
6         P1036B                     6831                      0.000
7         P1036C                     1324                      1.000
8         P1036D                     4175                      0.036
9         P1036E                     1370                      1.000
10        P1036F                      875                      1.000

I am trying to follow the guide here:

Combine bar and line chart in ggplot2

I want to plot all but the first 2 rows of df and have the following code. It nearly works as the guide says but the second axis isn't working?? E.g. I want to see the CONTAMINATION_P_VALUE (NA) column as the line on the second axis as the guide shows

  ggplot(df[-c(1,2),])  + 
    geom_bar(aes(x=Sample_ID, y=`CONTAMINATION_SCORE (NA)`),stat="identity", fill=rainbow(n=length(df$Sample_ID[-c(1:2)])))+
    geom_line(aes(x=Sample_ID, y=`CONTAMINATION_P_VALUE (NA)`),stat="identity",color="red")+
    labs(title= " QC",
         x="Sample ID",y=" Score") +
    scale_y_continuous(sec.axis=sec_axis(~.*0.01,name="Percentage"))

enter image description here

stefan
  • 90,330
  • 6
  • 25
  • 51
user483292
  • 61
  • 6

1 Answers1

0

There are two issues with your code. First you have to add the group aes to the geom_line as your x variable is a categorical (whereas in the blog post it's a numeric). Second, when dealing with a secondary axis you (!!) have to rescale your data, i.e. in the blog post the values are multiplied by 100 (besides the inverse transformation in sec.axis). However, the scaling always depends on the data to be plotted. IMHO for your data a scaling by 10000 seems more appropriate which will you a secondary axis running from 0 to 100 percent.

Sample_ID <- c("LSL Guideline", "USL Guideline", "P1014B", "P1014F", "P1036A", "P1036B", "P1036C", "P1036D", "P1036E", "P1036F")
`CONTAMINATION_SCORE (NA)` <- c(0, 3106, 2677, 1021, 870, 6831, 1324, 4175, 1370, 875)
`CONTAMINATION_P_VALUE (NA)` <- c(0.000, 0.049, 0.101, 1.000, 1.000, 0.000, 1.000, 0.036, 1.000, 1.000)
df <- data.frame(Sample_ID, `CONTAMINATION_SCORE (NA)`, `CONTAMINATION_P_VALUE (NA)`, check.names = FALSE)

library(ggplot2)

df <- df[-c(1, 2), ]
pal <- rainbow(n = length(unique(df$Sample_ID)))

ggplot(df) +
  geom_col(aes(x = Sample_ID, y = `CONTAMINATION_SCORE (NA)`, fill = Sample_ID)) +
  geom_line(aes(x = Sample_ID, y = `CONTAMINATION_P_VALUE (NA)` * 10000, group = 1),
    color = "red"
  ) +
  scale_fill_manual(values = pal, guide = "none") +
  labs(title = " QC", x = "Sample ID", y = " Score") +
  scale_y_continuous(sec.axis = sec_axis(~ . / 10000, name = "Percentage", labels = scales::percent))

stefan
  • 90,330
  • 6
  • 25
  • 51
  • Ahhh, I see, thank you @stefan. Could you explain a little more what the group = 1 is? I don't quite understand...is it stipulating that each value in that column is a part on the x-axis? Also, my values for the line graph is actually a p-value not a percentage - do you have any ideas how I can scale the second y-axis better (e.g. to display better the 0.049 AND the 1.000) – user483292 Oct 19 '22 at 06:12
  • 1
    If your x variable is a categorical variable ggplot2 will by default group (or split) the data by this variable (actually all categorical variables). In case of a geom_line this means that only obs belonging to a group are connected. `group=1` overrides the grouping and tells ggplot to treat all obs as belonging to one group which we call 1, i.e. `group="foo"` will also work. See also https://stackoverflow.com/questions/10357768/plotting-lines-and-the-group-aesthetic-in-ggplot2 and https://ggplot2.tidyverse.org/reference/aes_group_order.html. – stefan Oct 19 '22 at 06:19