I have a dataframe of the following structure
var1 <- c(0,1,0,1,1,0,1,1,0,1,1,1,1,1,1)
var2 <- c(0,0,1,1,0,0,1,1,0,1,0,0,1,1,0)
var3 <- c(1,1,1,0,1,0,1,0,0,1,1,1,1,0,1)
var4 <- c(1,0,0,0,1,0,1,1,0,0,1,1,1,0,1)
var5 <- c(1,0,0,1,0,1,1,0,0,0,1,1,1,1,1)
var6 <- c(0,0,1,0,1,0,1,1,1,1,0,0,0,0,0)
group <- c(0,0,0,0,0,0,0,1,1,1,1,1,1,1,1)
numb <- 1:15
df <- data.frame(numb,group,var1,var2,var3,var4,var5,var6)
df$var1 <- factor(df$var1)
df$var2 <- factor(df$var2)
df$var3 <- factor(df$var3)
df$var4 <- factor(df$var4)
df$var5 <- factor(df$var5)
df$var6 <- factor(df$var6)
df$group <- factor(df$group)
summary(df)
numb group var1 var2 var3 var4 var5 var6
Min. : 1.0 0:7 0: 4 0:8 0: 5 0:7 0:6 0:9
1st Qu.: 4.5 1:8 1:11 1:7 1:10 1:8 1:9 1:6
Median : 8.0
Mean : 8.0
3rd Qu.:11.5
Max. :15.0
I want to make a combined dodged barplot of all theese variables divided by a group factor.
What I managed to do for now is a barplot with data from the original dataframe entered into a new frame in a long format with help of reshape2
df_long <- reshape2::melt(df, measure.variables = c("var1","var2","var3","var4","var5","var6"), id.vars = c("group","numb"))
Then I count the percentages of observed variables
library(dplyr)
df_pct <- df_long %>%
count(value, group, variable) %>%
mutate(pct = prop.table(n))
And plot the graph with ggplot2
ggplot(data = df_pct, aes(x = variable, y = pct, fill = group, label = scales::percent(pct))) +
geom_col(position = 'dodge') +
geom_text(position = position_dodge(width = .9), # move to center of bars
vjust = -0.5, # nudge above top of bar
size = 3)+
scale_y_continuous(labels = scales::percent)
But the picture I get apparently has a problem with the percetages it shows. I was expecting the graph to only count cases when df[df$var* == 1, ]
(variable present). I’m not certain what do the bars count in my example and it clearly shows the percentages for both df[df$var* == 1, ]
and df[df$var* == 0, ]
(variable absent)
Could please anyone help with the graph? What do I need to do to display percetages correctly? How do I plot the graph so that it counts only the "1" values of the variables?