I read already through all the other similar questions & answers, but all the given solutions with scale_y_continous simply don't work for my dataset. I have two different treatment groups data$InvA (Post-Covid) and data$InvAcc (Pre-Covid) where in each group they could choose between the options: Online Broker (1), Bank (2), No Account (3). As the subjects were put randomly in group 1 or 2, I have logically a lot of NA's in my dataset. Now when I use ggplot, I'm able to display both results with the total number of individuals on the y-axis. However, I would like to change this to percent, since it would be a better fit for my thesis. I tried already every other option with scale_y_continuous but it rather doesn't work out properly (3000% percentage, or it doesn't calculate the right percent values) or it doesn't work at all.
This is my code:
library(gridExtra)
library(ggplot2)
require(gridExtra)
library(tidyverse)
plot1 <- ggplot(data = data, aes(InvA), na.rm=TRUE) +
geom_bar()+
scale_x_discrete(na.translate = FALSE)+
ylim(0,40)+
ggtitle("Post-Covid")+
xlab("Accounts")+
ylab("Total No. of Individuals")
plot2 <- ggplot(data = data, aes(InvAcc), na.rm=TRUE) +
geom_bar()+
scale_x_discrete(na.translate = FALSE)+
ylim(0,40)+
ggtitle("Pre-Covid")+
xlab("Accounts")+
ylab("Total No. of Individuals")
grid.arrange(plot2, plot1,ncol=2) # Write the grid.arrange in the file
#dev.off() # Close the file
#pdf("Accountss.pdf", width = 8, height = 6) # Open a new pdf file
My data:
dput(data)
structure(list(data.InvAcc = c(2L, NA, 2L, NA, NA, 3L, 3L, 3L,
NA, 3L, 3L, NA, 1L, NA, 1L, NA, NA, 1L, NA, NA, NA, 1L, 3L, 1L,
NA, NA, 1L, 2L, NA, NA, 1L, 1L, 1L, 1L, 1L, 1L, 1L, NA, 2L, NA,
NA, 3L, NA, NA, 1L, NA, 2L, NA, NA, NA, NA, NA, NA, NA, NA, 1L,
1L, 1L, 1L, NA, NA, NA, 3L, NA, 1L, NA, NA, 2L, NA, 1L, 1L, 1L,
NA, 1L, 3L, NA, 1L, NA, 3L, NA, NA, 2L, 3L, 2L, 1L, NA, 3L, 2L,
NA, NA, 3L, NA, 2L, 1L, NA, 3L, 2L, 1L, 3L, 3L, 3L, NA, 3L, NA,
3L, NA, 3L, 1L, NA, NA, NA, 1L, NA, NA, NA, 1L, NA, NA, 3L, NA,
NA, 3L, 3L, 3L, 3L, NA, 1L, NA, NA, NA, 3L, NA, 3L), data.InvA = c(NA,
1L, NA, 2L, 1L, NA, NA, NA, 3L, NA, NA, 3L, NA, 3L, NA, 1L, 2L,
NA, 1L, 1L, 1L, NA, NA, NA, 1L, 2L, NA, NA, 2L, 1L, NA, NA, NA,
NA, NA, NA, NA, 3L, NA, 1L, 1L, NA, 1L, 1L, NA, 1L, NA, 1L, 3L,
1L, 1L, 1L, 2L, 1L, 1L, NA, NA, NA, NA, 1L, 1L, 1L, NA, 2L, NA,
2L, 1L, NA, 2L, NA, NA, NA, 2L, NA, NA, 2L, NA, 1L, NA, 3L, 3L,
NA, NA, NA, NA, 1L, NA, NA, 1L, 2L, NA, 1L, NA, NA, 1L, NA, NA,
NA, NA, NA, NA, 1L, NA, 1L, NA, 1L, NA, NA, 1L, 1L, 3L, NA, 1L,
2L, 2L, NA, 1L, 1L, NA, 3L, 1L, NA, NA, NA, NA, 1L, NA, 1L, 3L,
1L, NA, 3L, NA)), class = "data.frame", row.names = c(NA, -133L
))
data$InvAcc: Online Broker --> 31 (45%), Bank --> 11 (16%), No Account --> 27(39%)
data$InvA: Online Broker --> 40 (63%), Bank --> 13 (20%), No Account --> 11(17%)
Thank you all for your help, appreciate your time!