0

I have a dataframe of the following structure

var1 <- c(0,1,0,1,1,0,1,1,0,1,1,1,1,1,1)
var2 <- c(0,0,1,1,0,0,1,1,0,1,0,0,1,1,0)
var3 <- c(1,1,1,0,1,0,1,0,0,1,1,1,1,0,1)
var4 <- c(1,0,0,0,1,0,1,1,0,0,1,1,1,0,1)
var5 <- c(1,0,0,1,0,1,1,0,0,0,1,1,1,1,1)
var6 <- c(0,0,1,0,1,0,1,1,1,1,0,0,0,0,0)
group <- c(0,0,0,0,0,0,0,1,1,1,1,1,1,1,1)
numb <- 1:15
df <- data.frame(numb,group,var1,var2,var3,var4,var5,var6)
df$var1 <- factor(df$var1)
df$var2 <- factor(df$var2)
df$var3 <- factor(df$var3)
df$var4 <- factor(df$var4)
df$var5 <- factor(df$var5)
df$var6 <- factor(df$var6)
df$group <- factor(df$group)
summary(df)
     numb      group var1   var2  var3   var4  var5  var6 
 Min.   : 1.0   0:7   0: 4   0:8   0: 5   0:7   0:6   0:9  
 1st Qu.: 4.5   1:8   1:11   1:7   1:10   1:8   1:9   1:6  
 Median : 8.0                                              
 Mean   : 8.0                                              
 3rd Qu.:11.5                                              
 Max.   :15.0                                              

I want to make a combined dodged barplot of all theese variables divided by a group factor.

What I managed to do for now is a barplot with data from the original dataframe entered into a new frame in a long format with help of reshape2

df_long <- reshape2::melt(df, measure.variables = c("var1","var2","var3","var4","var5","var6"), id.vars = c("group","numb"))

Then I count the percentages of observed variables

library(dplyr)
df_pct <- df_long %>% 
    count(value, group, variable) %>% 
    mutate(pct = prop.table(n))

And plot the graph with ggplot2

ggplot(data = df_pct, aes(x = variable, y = pct, fill = group, label = scales::percent(pct))) + 
  geom_col(position = 'dodge') + 
  geom_text(position = position_dodge(width = .9),    # move to center of bars
              vjust = -0.5,    # nudge above top of bar
              size = 3)+
  scale_y_continuous(labels = scales::percent)

But the picture I get apparently has a problem with the percetages it shows. I was expecting the graph to only count cases when df[df$var* == 1, ](variable present). I’m not certain what do the bars count in my example and it clearly shows the percentages for both df[df$var* == 1, ] and df[df$var* == 0, ](variable absent)

Could please anyone help with the graph? What do I need to do to display percetages correctly? How do I plot the graph so that it counts only the "1" values of the variables?

enter image description here

  • 1
    Please edit your question to include the output of `dput(df)`, or create some 'fake' data per https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – jared_mamrot Apr 22 '21 at 00:22
  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. It is not easy to copy/paste the data in to R in the format you've used. – MrFlick Apr 22 '21 at 00:40
  • what percentages would you like to display there? – AnilGoyal Apr 22 '21 at 06:15
  • @MrFlick Sorry, thought that would suffice. I added a toy example – Tony Zhelonkin Apr 22 '21 at 06:28
  • @AnilGoyal I wanted the bars and percentages likewise to count cases only when df$var* == 1, so that I could compare the two groups. I’m not sure what the bars in my graph count, but as I understand it shows the percentages both for df$var* == 1 and df$var* == 0 – Tony Zhelonkin Apr 22 '21 at 06:33

0 Answers0