1

I have a plot in which I show the sample mean for several binary variables, within two different populations. It currently looks like this:

Sample means for categorical covariates

I reshaped my data to build this plot, so the data and code for this graph look like this:

     head(cat)
     hd var valor
     1  1 gen     1
     2  1 gen     0
     3  1 gen     0
     4  1 gen     0
     5  1 gen     0
     6  1 gen     0

  # This is my code

    ggplot(cat, aes(y = valor, x = as.factor(var), group = hd)) +
    geom_bar(aes(fill = hd), 
             stat = 'summary', 
             fun.y = mean, 
             position = 'dodge') +
    stat_summary(fun.data = mean_cl_normal,
      geom = 'errorbar', 
      position = position_dodge(width = 0.85), 
      width = 0.2) +
    scale_x_discrete(labels = c('abogado_pub' = 'Public Lawyer',
                                'codem' = 'Co-defendant',
                                'gen' = 'Gender', 
                                'indem' = 'Severance Pay',
                                'reinst' = 'Reinstatement',
                                'sarimssinf' = 'Social Security',
                                'trabajador_base' = 'At-will worker')) +
    scale_y_continuous(labels = scales::percent_format()) +
    labs(y = 'Percent', x = 'Variable') + 
    scale_fill_manual(values = c('gray77', 'gray53'), 
                      name = '',
                      labels = c('Pilot Data', 'Historic Data')) +
    theme_classic() +
    theme(axis.text.x = element_text(angle = 45, hjust = 1)) 

For each pair of bars, I want to add stars representing the significance level of a two-sided test for difference in means. I have tried several solutions like this one, but the annotations never show in the plot. I'm guessing there is something I'm missing regarding the combination of stat = summary layers with stat = identity, but I can't quite understand what it is. I was also looking at this solution, but I don't know if it is possible to do something like this for my problem, since my annotation implies dropping one grouping level.

Some session info:

R version 3.4.0 (2017-04-21)
ggplot2_2.2.1
ggsignif_0.3.0

Thank you!

********************* EDIT *********************************

A reproducible example to generate a sample of my dataset:

  set.seed(140692) 
  cat = data.frame( hd = sample (c(1,0), 70, replace = T), 
                    var = rep(c('abogado_pub', 'codem', 'gen', 'indem', 'reinst', 'sarimssinf', 'trabajador_base'), 20), 
                    valor = sample (c(1,0), 70, replace = T))

0 Answers0