1

I wanted make a plot using ggplot2 so that there are bar charts displaying the degrees (bars) people in each kind of urban/rural environment (facet) hold. I achieved that.

Now I want to add ratios of the people with each kind of qualification for each facet. What I got using the code below is percentages for the whole population.

How can I change the code so that the percentages will be counted inside each facet?

Here is a sample with 1,000 rows from the data set I used: link.

library(ggplot2)
library(scales)

# plot urban/rural by degree in facets
 myplot <- ggplot(data = si
                     ,aes(DEGREE)
    ) 
    myplot <- myplot + geom_bar()
    myplot <- myplot + labs(title = "Degree by Urban/Rural", y = "Percent", x = "DEGREE")
    myplot <- myplot + geom_text(aes(y = ((..count..)/sum(..count..)), label = scales::percent((..count..)/sum(..count..))), stat = "count", vjust = -0.25)
    myplot <- myplot + facet_wrap(~URBRURAL)
    myplot <- myplot + theme(axis.text.x = element_text(angle = 20, hjust = 1))
    myplot

enter image description here

DSC
  • 365
  • 3
  • 17
  • can you share sample data? – Sandipan Dey Nov 24 '16 at 18:26
  • 1
    A lot of potential duplicates out there, including [here](http://stackoverflow.com/questions/4725339/percentage-on-y-lab-in-a-faceted-ggplot-barchart), [here](http://stackoverflow.com/questions/9614720/obtaining-percent-scales-reflective-of-individual-facets-with-ggplot2), and [here](http://stackoverflow.com/questions/12236160/ggplot-sum-percentages-for-each-facet-respect-fill). Have you tried any of the options in those answers? – aosmith Nov 24 '16 at 18:45
  • [Here is a sample of my data](http://s000.tinyupload.com/?file_id=02792393224272274158) – DSC Nov 24 '16 at 19:08
  • I checked those answers, it didn't help. – DSC Nov 24 '16 at 19:20
  • You have renamed the y-axes and changed the scale to percent but the height of the bars is still counts. Is this really your intention? – Uwe Nov 25 '16 at 07:45

2 Answers2

4

You can always transform your data to calculate what you want prior to plotting it. I added some tweaks as well (labels at the top of the bar, string wrapping on the x-axis, axis limits and labels).

library(dplyr)
library(ggplot2)
library(stringr)

plot_data <- df %>% 
  group_by(URBRURAL, DEGREE) %>% 
  tally %>% 
  mutate(percent = n/sum(n))

ggplot(plot_data, aes(x = DEGREE, y = percent)) +
  geom_bar(stat = "identity") +
  geom_text(aes(label = percent(percent)), vjust = -0.5) +
  labs(title = "Degree by Urban/Rural", y = "Percent", x = "DEGREE") +
  scale_y_continuous(labels = percent, limits = c(0,1)) +
  scale_x_discrete(labels = function(x) str_wrap(x, 10)) +
  facet_wrap(~URBRURAL) 

enter image description here

Jake Kaupp
  • 7,892
  • 2
  • 26
  • 36
  • Clean & straightforward solution for the computation and improved graphics. However, the code you provided still prints slanted labels on the x-axis and not wrapped as depicted. Please, can you add the code for wrapping the labels - Thank you. – Uwe Nov 25 '16 at 07:54
  • Thank you Jake. `scales_x_discrete()` creates an error. It should read `scale_x_discrete()`. Unfortunately, I'm not allowed to do edits of less than 6 characters so can you kindly correct the typo yourself - Thank you. – Uwe Nov 26 '16 at 07:17
  • Typo is now fixed – Jake Kaupp Nov 26 '16 at 12:50
0

This works I think:

si <- read.csv('sampledata.csv', sep=' ')
myplot <- ggplot(data = si
                 ,aes(DEGREE)
) 
myplot <- myplot + geom_bar()
myplot <- myplot + labs(title = "Degree by Urban/Rural", y = "Percent", x = "DEGREE")
myplot <- myplot +  geom_text(aes(y = ((..count..)/tapply(..count..,..PANEL..,sum)[..PANEL..]), label = scales::percent((..count..)/tapply(..count..,..PANEL..,sum)[..PANEL..])), stat = "count", vjust = -0.25)
myplot <- myplot + facet_wrap(~URBRURAL)
myplot <- myplot + theme(axis.text.x = element_text(angle = 20, hjust = 1))
myplot

enter image description here

Actually y axis lables are not percents but actual counts as they were in your original figure, the labels on the bars represent percents, look at row 18 below, which shows that 45 is not the precentage but the actual count of that group in the sample data you provided, whereas 15.7% on the same bar in the corresponding facet represents the percentage.

library(dplyr)
as.data.frame(si %>% group_by(URBRURAL, DEGREE) %>% summarise(n=n()))

1  Country village, other type of community Above higher secondary level, other qualification  6
2  Country village, other type of community                        Above lowest qualification 16
3  Country village, other type of community                        Higher secondary completed  9
4  Country village, other type of community                       Lowest formal qualification 31
5  Country village, other type of community                           No formal qualification 20
6  Country village, other type of community                       University degree completed  1
7               Farm or home in the country                        Above lowest qualification  1
8               Farm or home in the country                        Higher secondary completed  1
9               Farm or home in the country                       Lowest formal qualification  5
10              Farm or home in the country                           No formal qualification  1
11              Farm or home in the country                       University degree completed  1
12           Suburb, outskirt of a big city Above higher secondary level, other qualification 45
13           Suburb, outskirt of a big city                        Above lowest qualification 57
14           Suburb, outskirt of a big city                        Higher secondary completed 75
15           Suburb, outskirt of a big city                       Lowest formal qualification 48
16           Suburb, outskirt of a big city                           No formal qualification 23
17           Suburb, outskirt of a big city                       University degree completed 15
18                       Town or small city Above higher secondary level, other qualification 45
Sandipan Dey
  • 21,482
  • 2
  • 51
  • 63
  • The height of the bars doesn't match the percentage scale, e.g., the first bar in the kower left facet ("Town or small city") is label 15.7% percent but the bar reaches as high as 45%. – Uwe Nov 25 '16 at 07:38
  • Actually y axis lables are not percents but *actual counts* as they were in your original figure, the labels on the bars represent *percentages*, updated the post. – Sandipan Dey Nov 25 '16 at 07:47