0

I want to create an R bar-plot with different variables in multiple columns, all in one chart. I am only able to do a 2x2 plot with the following code:

barplot(table(y = cut$Gender,x = cut$Education))

Even so, Gender gets stacked on top of Education.

Respondents Gender and Education level

The type of chart I want looks like this: enter image description here

My sample dataset is:

structure(list(Gender = c("Male", "Male", "Male", "Male", "Male", 
"Male", "Male", "Male", "Female", "Male", "Male", "Male", "Male", 
"Female", "Male", "Female", "Male", "Male", "Male", "Male"), 
    Age = c("45-54 yrs", "35-44 yrs", "25-34 yrs", "25-34 yrs", 
    "25-34 yrs", "45-54 yrs", "25-34 yrs", "25-34 yrs", "25-34 yrs", 
    "35-44 yrs", "18-24 yrs", "25-34 yrs", "25-34 yrs", "55-64 yrs", 
    "35-44 yrs", "35-44 yrs", "35-44 yrs", "45-54 yrs", "35-44 yrs", 
    "45-54 yrs"), Employment = c("Civil servant", "Private sector", 
    "Private sector", "Private sector", "Trader", "Civil servant", 
    "Private sector", "Private sector", "Private sector", "Civil servant", 
    "Student", "Student", "Civil servant", "Retired", "Self-employed", 
    "Private sector", "Civil servant", "Civil servant", "Private sector", 
    "Private sector"), Marriage = c("Married", "Married", "Married", 
    "Married", "Single, never married", "Married", "Married", 
    "Married", "Married", "Married", "Single, never married", 
    "Single, never married", "Married", "Married", "Married", 
    "Married", "Married", "Married", "Married", "Married"), Education = c("Advanced degree", 
    "Advanced degree", "Bachelor's degree", "Bachelor's degree", 
    "Secondary education", "Advanced degree", "Bachelor's degree", 
    "Bachelor's degree", "Secondary education", "Secondary education", 
    "Secondary education", "Secondary education", "Advanced degree", 
    "Bachelor's degree", "Basic education", "Advanced degree", 
    "Advanced degree", "Advanced degree", "Advanced degree", 
    "Advanced degree"), Residence = c("Ashanti", "Ashanti", "Ashanti", 
    "Ashanti", "Ashanti", "Brong-Ahafo", "Brong-Ahafo", "Brong-Ahafo", 
    "Brong-Ahafo", "Brong-Ahafo", "Brong-Ahafo", "Brong-Ahafo", 
    "Central", "Central", "Eastern", "Greater Accra", "Greater Accra", 
    "Greater Accra", "Greater Accra", "Greater Accra"), Experience = c("Never", 
    "Never", "Never", "Never", "Never", "Never", "Never", "Never", 
    "Never", "Never", "Never", "Never", "Never", "Never", "Never", 
    "Never", "Never", "Never", "Never", "Never")), .Names = c("Gender", 
"Age", "Employment", "Marriage", "Education", "Residence", "Experience"
), row.names = c(NA, 20L), class = "data.frame")
Masssly
  • 29
  • 1
  • 7

1 Answers1

1

Here is an approach:

First convert the data to long format, here one has two options melt from reshape package or gather from tidyr. Here I will use tidyverse library which loads many useful packages.

library(tidyverse)

 df %>%
      gather(variable, value) 

Then make a bar plot with ggplot2

ggplot()+
     geom_bar(aes(x = variable, fill = value), color = "black" , position = "stack", show.legend = FALSE)

To add text annotations we make a geom_text layer, the positions of the labels will be determined by stat = "count" which calculates a special variable ..count.. corresponding to the top of the bars since this is a bit crude on the plot we can adjust it with vjust = 1

geom_text(stat = "count", aes(x = variable, label =  value,
                              y = ..count..,
                              group = value),
          position = "stack", vjust = 1)

To add percent labels on y axis the usual is y = (..count..)/sum(..count..), however the sum(..count..) is the sum of counts across all variables and is not appropriate here so the easiest solution is to manually label

scale_y_continuous(labels =  c("0%", "25%", "50%", "75%", "100%"),
                   breaks = c(0, 5, 10, 15, 20))

How it looks all together:

library(tidyverse)

 df %>%
  gather(variable, value) %>%
  ggplot()+
  geom_bar(aes(x = variable, fill = value),
           color = " black",
           position = "stack", show.legend = FALSE)+
  geom_text(stat = "count",
             aes(x = variable,
                 label =  value,
                 y = ..count..,
                 group = value),
             position = "stack", vjust = 1) +
scale_y_continuous(labels =  c("0%", "25%", "50%", "75%", "100%"),
                   breaks = c(0, 5, 10, 15, 20))

enter image description here

another option is y = ..count../sum(..count..)*7 since there are 7 variables

df %>%
  gather(variable, value) %>%
  ggplot()+
  geom_bar(aes(x = variable, y = ..count../sum(..count..)*7, fill = value), color= " black", position = "stack", show.legend = FALSE)+
  geom_text(stat = "count", aes(x = variable, label =  value,  y = ..count../sum(..count..)*7, group = value), position = "stack", vjust = 1)+
  scale_y_continuous(labels = scales::percent)+
  ylab("")

same output graph

You can even add a conditional line break in the labels using mutate with gsub and negative lookahead

df %>%
  gather(variable, value) %>% 
  mutate(label = gsub(" (?!yrs)", "\n",  value, perl = T)) %>%
  ggplot()+
  geom_bar(aes(x = variable, y = ..count../sum(..count..)*7, fill = value), color= " black", position = "stack", show.legend = FALSE)+
  geom_text(stat = "count", aes(x = variable, label =  label,  y = ..count../sum(..count..)*7, group = value), position = "stack", vjust = 1)+
  scale_y_continuous(labels = scales::percent)+
  ylab("")

enter image description here

missuse
  • 19,056
  • 3
  • 25
  • 47
  • Thank you. Is there a way to replace the count_frequency (0 to 20) with percentage (as in 0-100%). – Masssly Oct 10 '17 at 17:53
  • just add `scale_y_continuous(labels = c("0%", "25%", "50%", "75%", "100%"), breaks = c(0, 5, 10, 15, 20))`. There are other ways but I trust this is easiest in current example – missuse Oct 10 '17 at 17:59