0

I have a dataframe called "employee_attrition". There are two variables of my interest, the first one is called "MonthlyIncome" (with continuous data of salary) and the second one is "PerformanceRating" which takes discrete values (1,2,3 or 4). My intention is to create a histogram for the MonthlyIncome, and show the PerformanceRating in the same plot. I have this:

 ggplot(data = employee_attrition, aes(x=MonthlyIncome, fill=PerformanceRating))+
      geom_histogram(aes(y=..count..))+
      xlab("Salario mensual (MonthlyIncome)")+
      ylab("Frecuencia")+
      ggtitle("Histograma: MonthlyIncome y Attrition")+
      theme_minimal()

The problem is that the plot does not show the "PerformanceRating" associated with each bar of the histogram.

My data frame is something like this:

    MonthlyIncome  PerformanceRating
1          5993         1
2          5130         1
3          2090         4
4          2909         3
5          3468         4
6          3068         3

And i want a histogram that shows the frequency of MonthlyIncome and each bar with 4 colours of the PerformanceRating.

Something like this, but with 4 colours (PerformanceRating Values)

LC-datascientist
  • 1,960
  • 1
  • 18
  • 32
  • 1
    It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input and desired output that can be used to test and verify possible solutions. It's unclear what you expect this plot to look like. Are you looking for a stacked bar chart type appearance? Maybe you want `aes(x=MonthlyIncome, fill=factor(PerformanceRating))`? – MrFlick May 28 '21 at 02:50
  • MrFlick is almost certainly correct, if you want discrete fill colors, you need a discrete data type like `factor`. – Gregor Thomas May 28 '21 at 03:04
  • 1
    That said, stacked histograms can be very hard to read - I'd suggest using `facet_wrap(~PerformanceRating)` as well. – Gregor Thomas May 28 '21 at 03:05
  • I want something like: Each bar contains 4 colors, representing the frequency of the PerformanceRating Values and the large of the entire bar representing the frequency of the MonthlyIncome. – Ricardo Bonilla May 28 '21 at 03:07
  • I think you'll easily find your answer if you search the web. This [answer](https://stackoverflow.com/a/20184572/6288065) on Stack Overflow, for example, may answer your question. The how-to guide in this [link](https://www.delftstack.com/howto/r/stacked-histogram-in-r/) can be helpful, too. – LC-datascientist May 28 '21 at 03:14

1 Answers1

0

To make the fill commands works, you should first making factor the grouping variables.

library(tibble)
library(tidyverse)


##---------------------------------------------------
##Creating a sample dataset simulating your dataset
##---------------------------------------------------

employee_attrition <- tibble(
  MonthlyIncome = sample(3000:5993, 1000, replace = FALSE),
  PerformanceRating = sample(1:4, 1000, replace =  TRUE)
)


##------------------------------------
## Plot - also changing the format of
## PerformanceRating to "factor"
##-----------------------------------

employee_attrition %>%
  mutate(PerformanceRating = as.factor(PerformanceRating)) %>%
    ggplot(aes(x=MonthlyIncome, fill=PerformanceRating))+
    geom_histogram(aes(y=..count..), bins = 20) +
    xlab("Salario mensual (MonthlyIncome)")+
    ylab("Frecuencia")+
    ggtitle("Histograma: MonthlyIncome y Attrition")+
    theme_minimal()

enter image description here

Behnam Hedayat
  • 837
  • 4
  • 18