1

I have a dataframe, i am interested in the relationship between two categorical variables Type and Location, Type has 5 levels and the Location has 20 levels.

I want to plot the percentage of Types for each location. I wanted to know if there was a concise way of doing it using ggplot2 ?

In my case the variable in the x axis has 20 levels so i am also running into spacing issues, any help would be appreciated

EDIT: A more concrete example:

df
   gender beverage
1  Female     coke
2    Male     bear
3    Male     coke
4  Female     bear
5    Male      tea
6    Male     bear
7  Female    water
8  Female      tea
9  Female     bear
10   Male      tea

I want to plot the gender wise percentage of each beverage, eg: There are 3 tea drinkers of which 2 are male and 1 is female so male % would be 66.67 and female percentage would be 33.33 So in the x axis corresponding to tea there should be two bars male with y = 66.67 and female with y = 33.33.

Vikash Balasubramanian
  • 2,921
  • 3
  • 33
  • 74
  • The chances of getting a useful answer will be much higher if you would include a [good example](http://stackoverflow.com/questions/5963269). – Axeman Sep 02 '16 at 09:05
  • Well my own data is quite large and i don't have the data for the graph i provided, so if you can clarify what kind of example i should give it would be helpful, isn't the plot enough to understand what i want done? – Vikash Balasubramanian Sep 02 '16 at 09:09
  • Use an excerpt of your data, or some made up random data, or an inbuilt dataset (read the link I posted). Also, if you are able to do it with preprocessing, it would be useful to actually show that code. – Axeman Sep 02 '16 at 09:13
  • @Axeman I removed the graph because it didn't accurately reprsent what i wanted to do, and i realized that the way i did using preprocessing was also wrong. I have edited the question for specifying what i want do do. – Vikash Balasubramanian Sep 02 '16 at 09:33

1 Answers1

4

The easiest way is to pre-process, since we have to calculate the percentages separately by gender. I use complete to make sure we have the zero percent bars explicitly in the data.frame, otherwise ggplot will ignore that bar and widen the other gender's bar.

library(dplyr)
library(tidyr)
df2 <- df %>% 
  group_by(gender, beverage) %>% 
  tally() %>% 
  complete(beverage, fill = list(n = 0)) %>% 
  mutate(percentage = n / sum(n) * 100)

ggplot(df2, aes(beverage, percentage, fill = gender)) + 
  geom_bar(stat = 'identity', position = 'dodge') +
  theme_bw()

enter image description here

Or the other way around:

df3 <- df %>% 
  group_by(beverage, gender) %>% 
  tally() %>% 
  complete(gender, fill = list(n = 0)) %>% 
  mutate(percentage = n / sum(n) * 100)

ggplot(df3, aes(beverage, percentage, fill = gender)) + 
  geom_bar(stat = 'identity', position = 'dodge') +
  theme_bw()

enter image description here

Axeman
  • 32,068
  • 8
  • 81
  • 94
  • +1 for the complete part. but actually this is not the percentage i want, your plot tells me that 20% of males are tea drinkers, what i wanted to know is that 66.67% of tea drinkers are male i.e # of males drinking tea/ # of tea drinkers *100, how do i do that because this seems more difficult. – Vikash Balasubramanian Sep 02 '16 at 10:51
  • Why is that anymore difficult? Just swap `beverage` and `gender` in `group_by`, and replace `beverage` with `gender` in `complete`. – Axeman Sep 02 '16 at 10:54