1

I managed to create a Pareto chart, however, I would like to improve certain things but I lack the skills to do it. Maybe someone could have a quick look at the graph and let me know, if

  1. I can have on the right y-axis, where Cumulative frequencies(%) are, I could have the percentage symbol after the numbers? This way I could remove the axis title, which would be great

  2. In case number 1 is not possible, how can I make the right y-axis title bigger? The size = 12 cannot be entered and I am not sure how that would work out. I was also considering to rotate the title but again I am not sure if this is doable

  3. Is it possible to rotate the A,B,C,D... labels so that they are not vertical but horizontal?

  4. I was wondering if adding relative frequencies above the bars is an option, as well as percentages above the dots on the red curve, which represent the cumulative frequencies?

Minimal example

set.seed(42)  ## for sake of reproducibility
c <- data.frame(value=factor(paste("value", 1:n)),counts=sample(18:130, n, replace=TRUE))

Cumulative frequnecies for pareto chart

# It's maybe not the most elegant way of doing it but it works
# If someone can offer an alternative, that would be nice

df <- data.frame(c,stringsAsFactors = FALSE)

df <- df[order(df$counts,decreasing=TRUE), ]

df$value <- factor(df$value, levels=df$value)

df$cumulative <- cumsum(df$counts)

df$cumulative <- 100 * df$cumulative/tail(df$cumulative, n=1)

scaleRight <- tail(df$cumulative, n=1)/head(df$counts, n=1)

Pareto chart in ggplot

ggplot(df, aes(x=value)) +  theme_bw()+
  geom_bar(aes(y=counts, fill=value), stat="identity",show.legend = FALSE) +
  geom_path(aes(y=cumulative/scaleRight, group=1),colour="red", size=0.9) +
  geom_point(aes(y=cumulative/scaleRight, group=1),colour="red") +
  scale_y_continuous(sec.axis = sec_axis(~.*scaleRight, name = "Cumulative (%)"), n.breaks = 9) +
  theme(axis.text.x = element_text(angle=90, vjust=0.6)) +
  theme(
        legend.title = element_blank(),
        plot.title = element_text(hjust = 0.5),
        panel.background =element_blank(),panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(), axis.title.x=element_blank(),
        axis.text.x = element_text(size=12),
        axis.text.y = element_text(size=12)) +
  scale_color_grey(start=0, end=.6)+scale_fill_grey()+ ylab("Counts")

Output

Pareto Chart

Jo-Achna
  • 315
  • 1
  • 3
  • 14
  • 1) `sec_axis(..., labels = scales::percent)`. 3) comment out the 1st `theme`. 2) Have you tried `axis.text.y.right = element_text(size=12)`? – Rui Barradas Feb 08 '21 at 11:32
  • `axis.text.y.right = element_text(size=12` this will make the title bigger only on the left y-axis but not on the right y-axis. – Jo-Achna Feb 08 '21 at 12:55

1 Answers1

1

I love your question, you have put a great deal of effort into asking a good question with a reproducible example and working code (except n wasn't defined, but usually I can count to 7).

First off, I have taken the liberty to refactor your data manipulation code using tidyverse's dplyr. It makes it much more succinct to read. I furthermore avoided multiplying your cummulative percentage with 100, and you will see why. Also, I didn't get the same values as you did.

set.seed(42)  ## for sake of reproducibility
n <- 6
c <- data.frame(value=factor(paste("value", 1:n)),counts=sample(18:130, n, replace=TRUE))
dput(c)
structure(list(value = structure(1:6, .Label = c("value 1", "value 2", 
"value 3", "value 4", "value 5", "value 6"), class = "factor"), 
    counts = c(66L, 118L, 82L, 42L, 91L, 117L)), class = "data.frame", row.names = c(NA, 
-6L))

df <- c %>%
  arrange(desc(counts)) %>%
  mutate(
    value = factor(value, levels=value),
    cumulative = cumsum(counts) / sum(counts)
  ) 

df
    value counts cumulative
1 value 2    118  0.2286822
2 value 6    117  0.4554264
3 value 5     91  0.6317829
4 value 3     82  0.7906977
5 value 1     66  0.9186047
6 value 4     42  1.0000000

The A, B, C, D labels you are referring to, I assume are the x-axis labels. These have been rotated a quarter with the command (in your code!) - it's the angle=90 that caused it.

theme(axis.text.x = element_text(angle=90, vjust=0.6))

All in all, I propose the following solution:

f <- max(df$counts) # or df$counts[1], as it is sorted descendingly

ggplot(df, aes(x=value)) +  theme_bw(base_size = 12)+
  geom_bar(aes(y=counts, fill=value), stat="identity",show.legend = FALSE) +
  geom_path(aes(y=cumulative*f, group=1),colour="red", size=0.9) +
  geom_point(aes(y=cumulative*f, group=1),colour="red") +
  scale_y_continuous("Counts", sec.axis = sec_axis(~./f, labels = scales::percent), n.breaks = 9) +
  scale_fill_grey() +
  theme(
    axis.text = element_text(size=12),
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
    axis.title.x=element_blank()
  )

enter image description here

In response to questions:

Adding labels can be done with geom_text:

geom_text(aes(label=sprintf('%.0f%%', cumulative*100), y=cumulative*f), colour='red', nudge_y = 5) +
geom_text(aes(label=sprintf('%.0f%%', counts/sum(counts)*100), y=counts), nudge_y = 5) +

Note the use of nudge_y - this one may be difficult, because it works in the major y-axis scale, so nudging by "5" units here makes sense, but if your counts were in the thousands, "5" is not enough.

Please note that the solutions given here, only works as long as c (and df) contains the entire scope of values; i.e. if you 8 or 10 or more faults, but only want to show the 6 main faults, the calculations of cummulative sums and percentages will be wrong.

MrGumble
  • 5,631
  • 1
  • 18
  • 33
  • Thank you for taking time to address my question and offering a more elegant solution with all explanations. It will help me to understand your course of thoughts. The graph looks exactly how I wanted to have it (I will still remove the left y-axis title "Counts" but that is easy). Thank you for pointing out that actually I (ugh....) put the rotation of the labels; I've simply overseen it. Would it be still possible to put the cumulative frequencies above the red dots and relative frequencies above the bars? – Jo-Achna Feb 09 '21 at 09:39
  • I will double-check the values on my end to see what eventually went wrong there. – Jo-Achna Feb 09 '21 at 09:41
  • I know what went wrong, I attached a graph when n wasn't 7 but of a different value, so all is good here – Jo-Achna Feb 09 '21 at 09:48