3

I'm making a bar chart to show the percent of a sample that identifies as each of a set list of political parties across multiple years. That's fine. The trouble is getting the percent calculation on the vertical axis to use each year's total count as its denominator in the percentage calculation (it uses the total count across all years as that denominator).

In other words, the bars I'm generating add up to 100% but, considering this represents three years of data, I want them to add up to 300%. The sample size from each year varies, so multiplying vertical axis values times the number of years in the sample won't work.

ggplot(df.graph, aes(x=Answer, y=..count../sum(..count..), fill=Year)) +
  geom_bar(position="dodge")+ 
  scale_y_continuous(labels = function(x) paste0(x*100, "%"))+
  theme(axis.text.x=element_text(angle=45,hjust=1))+
  xlab(NULL)+
  ylab(NULL)

Bar chart (I'm too new to Stack Overflow to post images, apparently)

divibisan
  • 11,659
  • 11
  • 40
  • 58
nnev
  • 33
  • 2
  • Welcome to Stack Overflow! Please provide a [reproducible example in r](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). The link I provided, will tell you how. Moreover, please take the [tour](https://stackoverflow.com/tour) and visit [how to ask](https://stackoverflow.com/help/how-to-ask). Cheers. – M-- Jan 07 '19 at 22:21
  • So you want a stacked bar plot? as you have it currently, the bars are representative of one year, so 100% would be suitable. – bob1 Jan 07 '19 at 22:28

1 Answers1

1

Rather than using the default geom_bar(stat = "count"), try geom_bar(stat = "identity"). You might easily calculate the percentage with dplyr. For example, consider ggplot2::mpg data,

This dataset contains a subset of the fuel economy data that the EPA makes available on http://fueleconomy.gov. It contains only models which had a new release every year between 1999 and 2008 - this was used as a proxy for the popularity of the car.

-- https://ggplot2.tidyverse.org/reference/mpg.html

ggplot2::mpg %>% select(manufacturer, year)
#> # A tibble: 234 x 2
#>    manufacturer  year
#>    <chr>        <int>
#>  1 audi          1999
#>  2 audi          1999
#>  3 audi          2008
#>  4 audi          2008
#>  5 audi          1999
#>  6 audi          1999
#>  7 audi          2008
#>  8 audi          1999
#>  9 audi          1999
#> 10 audi          2008
#> # ... with 224 more rows
  • manufacturer: model name
  • year: year of manufacture

library(tidyverse)

1. Percentage versus year

You can calculate the percentage of each manufacturer over year. In other words, the sum of percentage over year in each manufacturer might be 1.

Also, you can use scales::percent instead of labels = function(x) paste0(x*100, "%").

mpg %>% 
  group_by(manufacturer) %>% 
  mutate(N = n()) %>% # number of each manufacturer
  group_by(manufacturer, year) %>% # pair of manu, year
  summarise(perc = n() / unique(N)) %>% # n() = number of each pair => n()/N = proportion
  ggplot() +
  aes(x = manufacturer, y = perc, fill = factor(year)) +
  geom_bar(position = "dodge", stat = "identity") + # use y as y axis
  scale_y_continuous(labels = scales::percent) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        axis.title = element_blank()) +
  labs(fill = "Year")

enter image description here

Adding each tick(red and blue), you can get 100% each.


2. Percentage over year

On the other hand, you can compute proportion of manufacturer in each year so that the sum of each year becomes 1.

mpg %>% 
  group_by(year) %>% 
  mutate(N = n()) %>% 
  group_by(manufacturer, year) %>% 
  summarise(perc = n() / unique(N)) %>% 
  ggplot() +
  aes(x = manufacturer, y = perc, fill = factor(year)) +
  geom_bar(position = "dodge", stat = "identity") +
  scale_y_continuous(labels = scales::percent) +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        axis.title = element_blank()) +
  labs(fill = "Year")

enter image description here

Adding each colour, you can get 100% each.

younggeun
  • 923
  • 1
  • 12
  • 19