0

I read already through all the other similar questions & answers, but all the given solutions with scale_y_continous simply don't work for my dataset. I have two different treatment groups data$InvA (Post-Covid) and data$InvAcc (Pre-Covid) where in each group they could choose between the options: Online Broker (1), Bank (2), No Account (3). As the subjects were put randomly in group 1 or 2, I have logically a lot of NA's in my dataset. Now when I use ggplot, I'm able to display both results with the total number of individuals on the y-axis. However, I would like to change this to percent, since it would be a better fit for my thesis. I tried already every other option with scale_y_continuous but it rather doesn't work out properly (3000% percentage, or it doesn't calculate the right percent values) or it doesn't work at all.

This is my code:

    library(gridExtra)
    library(ggplot2)
    require(gridExtra)
    library(tidyverse)
    plot1 <- ggplot(data = data, aes(InvA), na.rm=TRUE) +
     geom_bar()+
          scale_x_discrete(na.translate = FALSE)+
    ylim(0,40)+
       ggtitle("Post-Covid")+
         xlab("Accounts")+
           ylab("Total No. of Individuals")
         
        
         
    plot2 <- ggplot(data = data, aes(InvAcc), na.rm=TRUE) +
     geom_bar()+
          scale_x_discrete(na.translate = FALSE)+
    ylim(0,40)+
       ggtitle("Pre-Covid")+
         xlab("Accounts")+
           ylab("Total No. of Individuals")
         


    grid.arrange(plot2, plot1,ncol=2) # Write the grid.arrange in the file
    #dev.off() # Close the file
   #pdf("Accountss.pdf", width = 8, height = 6) # Open a new pdf file

enter image description here

My data:

dput(data)

structure(list(data.InvAcc = c(2L, NA, 2L, NA, NA, 3L, 3L, 3L, 
NA, 3L, 3L, NA, 1L, NA, 1L, NA, NA, 1L, NA, NA, NA, 1L, 3L, 1L, 
NA, NA, 1L, 2L, NA, NA, 1L, 1L, 1L, 1L, 1L, 1L, 1L, NA, 2L, NA, 
NA, 3L, NA, NA, 1L, NA, 2L, NA, NA, NA, NA, NA, NA, NA, NA, 1L, 
1L, 1L, 1L, NA, NA, NA, 3L, NA, 1L, NA, NA, 2L, NA, 1L, 1L, 1L, 
NA, 1L, 3L, NA, 1L, NA, 3L, NA, NA, 2L, 3L, 2L, 1L, NA, 3L, 2L, 
NA, NA, 3L, NA, 2L, 1L, NA, 3L, 2L, 1L, 3L, 3L, 3L, NA, 3L, NA, 
3L, NA, 3L, 1L, NA, NA, NA, 1L, NA, NA, NA, 1L, NA, NA, 3L, NA, 
NA, 3L, 3L, 3L, 3L, NA, 1L, NA, NA, NA, 3L, NA, 3L), data.InvA = c(NA, 
1L, NA, 2L, 1L, NA, NA, NA, 3L, NA, NA, 3L, NA, 3L, NA, 1L, 2L, 
NA, 1L, 1L, 1L, NA, NA, NA, 1L, 2L, NA, NA, 2L, 1L, NA, NA, NA, 
NA, NA, NA, NA, 3L, NA, 1L, 1L, NA, 1L, 1L, NA, 1L, NA, 1L, 3L, 
1L, 1L, 1L, 2L, 1L, 1L, NA, NA, NA, NA, 1L, 1L, 1L, NA, 2L, NA, 
2L, 1L, NA, 2L, NA, NA, NA, 2L, NA, NA, 2L, NA, 1L, NA, 3L, 3L, 
NA, NA, NA, NA, 1L, NA, NA, 1L, 2L, NA, 1L, NA, NA, 1L, NA, NA, 
NA, NA, NA, NA, 1L, NA, 1L, NA, 1L, NA, NA, 1L, 1L, 3L, NA, 1L, 
2L, 2L, NA, 1L, 1L, NA, 3L, 1L, NA, NA, NA, NA, 1L, NA, 1L, 3L, 
1L, NA, 3L, NA)), class = "data.frame", row.names = c(NA, -133L
))
data$InvAcc: Online Broker --> 31 (45%), Bank --> 11 (16%), No Account --> 27(39%)

data$InvA: Online Broker --> 40 (63%), Bank --> 13 (20%), No Account --> 11(17%)

Thank you all for your help, appreciate your time!

jrcalabrese
  • 2,184
  • 3
  • 10
  • 30
Kristian
  • 1
  • 2
  • 1
    Please provide an [MRE](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) and paste the output of `dput` for example data, then it's easier to help you, thanks! – starja Aug 12 '22 at 08:00
  • 1
    `+ scale_y_continuous(labels=scales::percent)` ? – Yacine Hajji Aug 12 '22 at 08:07
  • @YacineHajji - doesn't work for me, for the pre-covid group my y-axis then goes from 0 to 3000% and for the post-covid group from 0 to 4000% – Kristian Aug 12 '22 at 08:23

1 Answers1

1

The issue is that you are plotting the counts. If you want to plot the percentages than you have to tell ggplot to do so using e.g. y = after_stat(prop) which instead of the counts will map the proportions on y. Afterwards you could get petrcent labels using scales::percent:

library(gridExtra)
library(ggplot2)

plot1 <- ggplot(data = data, aes(InvA, y = after_stat(prop)), na.rm = TRUE) +
  geom_bar() +
  scale_x_discrete(na.translate = FALSE) +
  ylim(0, 40) +
  ggtitle("Post-Covid") +
  xlab("Accounts") +
  ylab("Total No. of Individuals") +
  scale_y_continuous(labels = scales::percent)
#> Scale for 'y' is already present. Adding another scale for 'y', which will
#> replace the existing scale.

plot2 <- ggplot(data = data, aes(InvAcc, y = after_stat(prop)), na.rm = TRUE) +
  geom_bar() +
  scale_x_discrete(na.translate = FALSE) +
  ylim(0, 40) +
  ggtitle("Pre-Covid") +
  xlab("Accounts") +
  ylab("Total No. of Individuals") +
  scale_y_continuous(labels = scales::percent)
#> Scale for 'y' is already present. Adding another scale for 'y', which will
#> replace the existing scale.

grid.arrange(plot2, plot1, ncol = 2)
#> Warning: Removed 64 rows containing non-finite values (stat_count).
#> Warning: Removed 69 rows containing non-finite values (stat_count).

stefan
  • 90,330
  • 6
  • 25
  • 51
  • Thank you @stefan, really saved my day! Only one remaining question: how can I label in ggplot single bars, in my example Online Broker, Bank, No Account as in my image before where I didn't have the values in percent? – Kristian Aug 12 '22 at 09:06