0

I'm trying to plot an overlayed graph of 2 professions and the probability of their incomes:

> data.frame(income = rep(c("$10 to $20", "$20 to $30", "$30 to $40"), 2), profession=c("A", "A", "A", "B", "B", "B"), prob=c(10, 50, 40, 20, 50, 30))
      income profession prob
1 $10 to $20          A   10
2 $20 to $30          A   50
3 $30 to $40          A   40
4 $10 to $20          B   20
5 $20 to $30          B   50
6 $30 to $40          B   30

Unfortunately this doesn't work well when the income start having values like "100", beacause it gets sorted alphabetically so we get an x-axis like (10, 100, 20, 30).

When I have a single profession, I can use df$income <- factor(df$income, levels = df$income), but that doesn't work here:

> df$income <- factor(df$income, levels = df$income)
Error in `levels<-`(`*tmp*`, value = as.character(levels)) : 
  factor level [4] is duplicated

Is there any way around that?

That's how I'm trying to plot:

ggplot(df, aes(x=income, y=prob, fill=profession)) + geom_bar(stat='identity', position='identity', alpha=0.5)
Lem0n
  • 1,217
  • 1
  • 14
  • 22
  • 1
    If you don't have too many income brackets, you can use brute force and explicitly specify them as you write your factor function, e.g. levels = I1, I2, I3. Since your entries are strings, explicitly defining them shouldn't change your calculations. – questionmark Jul 18 '20 at 02:03

2 Answers2

1

You need to specify the order of values explicitly like this.

df$income <- factor(df$income, 
                    levels = c("$10 to $20", "$20 to $30", "$30 to $40", "$100"))
Kota Mori
  • 6,510
  • 1
  • 21
  • 25
1

You can use gtools::mixedsort here to assign the factor levels directly based on their value.

df$income <- factor(df$income, levels = unique(gtools::mixedsort(df$income)))

You can then plot it as usual.

library(ggplot2)
ggplot(df, aes(x=income, y=prob, fill=profession)) + 
  geom_bar(stat='identity', position='identity', alpha=0.5)
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213