Problem overview
When representing data in graphs like pie charts or stacked 100% column/bar charts, I typically like to add data labels with the absolute and percentage values of each category. However, there are MANY cases when the percentages in those labels don't add up to 100% due to rounding. Is there any way to fix this?
Example and code
library(tidyverse)
# Creating a small dataset
df = data.frame(categories = c('Cat1','Cat2','Cat3','Cat4'),
values = c(2200,4700,3000,2000)) %>%
mutate(perc = values / sum(values))
# Creating the data label text.
# This is the step where I need to make a change. More specifically, in the `label_perc` section.
df = df %>% mutate(label_values = format(values,
big.mark = ",",
decimal.mark = ".",
scientific = FALSE),
label_perc = sprintf("%0.0f%%",
perc*100),
data_label = paste(label_values,
label_perc,
sep='\n'))
# Generating the pie chart graph in ggplot2
p = ggplot(df, aes(x = "", y = values, fill = categories)) +
geom_bar(width = 1,
stat = "identity") +
geom_text(aes(label = data_label),
position = position_stack(vjust = 0.5)) +
coord_polar(theta = "y") +
theme_void()
Notice how the percentages don't add up to 100%: 17% + 25% + 39% + 18% = 99%.
Is there a way to generate these data labels in a way that even the rounded percentages add up to 100%?
My trick for Excel
The same problem happens when I'm working in Excel. When it does, I just create a new column with the rounded percentages and then, for the last category, instead of using the ROUND()
function, I use 1 - SUM(...)
, like this:
This works great in Excel, but I'm not quite sure how to translate this solution into R.