R ggplot stacked bar label spacing

Question

I am working on a stacked bar chart that illustrates cumulative percentages.

I would like to show the percentage values on the bars, though need them to be spaced such that they don't overlap, but are still intuitively associated with their relevant parts of the bar. This means putting a minimum buffer between labels, but otherwise placing them in their relevant section of the bar.

Here is a dput() of the data used to produce the below figure:

data<-structure(list(dimension = structure(c(6L, 6L, 6L, 6L, 6L, 6L, 
    6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 
    6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 
    6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L), .Label = c("Achievement", 
    "Connection", "Leadership", "Lifestyle", "Partnership", "Support"
    ), class = "factor"), area = structure(c(21L, 21L, 24L, 24L, 
    23L, 23L, 23L, 18L, 18L, 18L, 15L, 15L, 21L, 21L, 24L, 24L, 23L, 
    23L, 23L, 18L, 18L, 18L, 15L, 15L, 21L, 21L, 24L, 24L, 23L, 23L, 
    23L, 18L, 18L, 18L, 15L, 15L, 21L, 21L, 24L, 24L, 23L, 23L, 23L, 
    18L, 18L, 18L, 15L, 15L), .Label = c("Appreciation", "Balance", 
    "Belonging", "Brand Passion", "Burnout", "Care", "Communication", 
    "Competence", "Consultation", "Family and Social Support", "Financial Performance", 
    "Forward Vision", "Fulfilment", "Harmony", "Innovation", "Integrity", 
    "Leadership Commitment", "Marketing Support", "Optimism", "Participation", 
    "Practical Support", "Satisfied Expectations", "Systems", "Training"
    ), class = "factor"), label = structure(c(31L, 32L, 43L, 50L, 
    6L, 55L, 33L, 42L, 41L, 4L, 45L, 44L, 31L, 32L, 43L, 50L, 6L, 
    55L, 33L, 42L, 41L, 4L, 45L, 44L, 31L, 32L, 43L, 50L, 6L, 55L, 
    33L, 42L, 41L, 4L, 45L, 44L, 31L, 32L, 43L, 50L, 6L, 55L, 33L, 
    42L, 41L, 4L, 45L, 44L), .Label = c("Able to contribute views", 
    "Achieving desired lifestyle", "Believe profits will grow", "Brand helps us compete", 
    "Business provides flexibility", "Business systems improve productivity", 
    "Can discuss differences openly", "Decisions have long-term focus", 
    "Enjoy running the business", "Expectations have been met", "Family positive about the business", 
    "Family supportive of the business", "Feel connected to other Franchisees", 
    "Feel informed on important issues", "Feel optimistic about future", 
    "Feel part of community", "Feel we belong to group", "Financial security has improved", 
    "Franchisees are appreciated", "Franchisees are recognised", 
    "Franchisees are respected", "Franchisor cares about our profits", 
    "Franchisor cares about our success", "Franchisor is competent", 
    "Franchisor is fair", "Franchisor is professional", "Franchisor is transparent", 
    "Franchisor is trustworthy", "Franchisor listens to Franchisees", 
    "Get involved in new initiatives", "Get practical support", "Get relevant support", 
    "Have access to useful data", "Have confidence in Franchisor", 
    "Have work satisfaction", "Leadership committed to future", "Long term goals exist", 
    "Love the brand", "Make a meaningful contribution", "Making a reasonable profit", 
    "Marketing attracts customers", "Marketing benefits the business", 
    "Meetings are motivating", "Network is adapting to market", "Network is innovative", 
    "No serious conflict with Franchisor", "Not emotionally drained by business", 
    "Not hassled by the business", "On track for financial success", 
    "Ongoing opportunities to upskill", "Participate in meetings & events", 
    "Processes exist to resolve conflict", "Proud of reputation", 
    "Share ideas for improvement", "There is a proven business model", 
    "There is clear future direction", "There is strategy to protect future", 
    "Would buy franchise again"), class = "factor"), variable = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
    3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("SD", 
    "D", "A", "SA"), class = "factor"), value = c(6.98, 9.09, 15.56, 
    6.82, 2.44, 6.67, 6.67, 11.36, 4.44, 11.36, 17.78, 2.33, 41.86, 
    43.18, 35.56, 50, 41.46, 28.89, 40, 20.45, 24.44, 20.45, 64.44, 
    20.93, 48.84, 47.73, 44.44, 40.91, 53.66, 55.56, 51.11, 63.64, 
    66.67, 59.09, 13.33, 74.42, 2.33, 0, 4.44, 2.27, 2.44, 8.89, 
    2.22, 4.55, 4.44, 9.09, 4.44, 2.33)), row.names = c(NA, -48L), .Names = c("dimension", 
    "area", "label", "variable", "value"), class = "data.frame")

Following the advice from another thread, I then calculate the locations of the labels, and create the labels themselves

library(plyr)
library(ggplot2)
data = ddply(data, .(label), transform, pos = (cumsum(value) - 0.5 * value))
data$pos[data$pos<5]<-4
data$x_label = paste0(sprintf("%.0f", data$value), "%")

Then I create the plot

heat_plot <- ggplot(data, aes(x = label, y = value, fill = variable))+ 
    geom_bar(stat = "identity",position="fill")+  
    scale_fill_manual(values= c(SD="dark red",D="firebrick3",A="chartreuse3",SA="dark green") ) +
    coord_flip()+
    scale_y_continuous(labels=c(0,25,50,75,100))+
    guides(fill=FALSE)+
    theme(axis.title.x = element_blank())+
    theme(axis.title.y = element_blank())+
    geom_text(aes(y = pos/100, label = x_label)) 

heat_plot

This produces the following plot, which looks good, though I am worried about overlapping labels. As you can see, the first two labels on the fourth column from the top are already getting close. Any ideas for nicely separating all labels? Is there an automated process?

I think this would be hard to automate, as the issues are dependent on the data (if two categories that are next to each other are small, the labels will almost by definition overlap, unless you enlarge the plot.). Is there a readon you need all the labels in your plot? For me, it doesn't really aid in the comprehension/conveyance of the point you're trying to make. — Heroka, Aug 24 '15 at 09:25
Nice job on creating the question! Following @Heroka's thinking, what if you created a condition on the labels so that, for example, if they are less than 10% they do not appear? You won't have an overlap problem, readers will know the percentages are small, and they can estimate from the x-axis break values. By the way, if you label every bar segment, why bother with the x-axis values? — lawyeR, Aug 24 '15 at 13:29
@lawyeR Thanks for the feedback. The reason for having it there was convention (terrible reason, I agree). I typically use the values to describe the red vs green, so might just have two values, collapsing across +ve and -ve (e.g. sum of greens, sum of reds) and put the values at either extreme on each bar. Basically this then says (for the bottom bar for example) "31% disagreed, 68% agreed". — Alex, Aug 24 '15 at 23:53
@Heroka I can only notify one user per. Thanks, and please see above — Alex, Aug 24 '15 at 23:53

R ggplot stacked bar label spacing

0 Answers0