0

I am new to R and have to use it for a course at uni. My question is, I am aiming to make a bar chart like the one pictured. I want bins of income at the bottom, with two bars for each bin representing the number of "Yes" and number of "No". I'll provide pictures of what I have done so far (with the desired result bottom right), but have been stuck for the past couple hours at this point. (so, for example, how do I find number of yes's within the first bin, which is between 1000 and 5800) (and then if possible how would I recreate this bar plot with my figures). Thanks heaps everyone!

Dataset

enter image description here

1st 5 Rows of dataset directly relating to question MonthlyIncome Attrition 1 1081 Yes 2 1232 No 3 1261 Yes 4 1420 Yes 5 1483 No

Rnoob
  • 1
  • 1
  • 1
    Please see [How to make a great R reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – MonJeanJean Mar 02 '22 at 08:47
  • Hi @MonJeanJean, Sorry I am honestly terrible at R and quite limited to what we have been taught within three classes. I read the link provided, and really focused on the subset section, however I cannot for the life of me understand how to replicate it with my data. (I have no idea what anything other than the subset thing is, never seen dput() or anything like that before). Any advice would be much appreciated. Cheers! – Rnoob Mar 02 '22 at 09:13
  • You can share us your data doing `dput(data)` and pasting the result in your question :) – MonJeanJean Mar 02 '22 at 09:17
  • as @MonJeanJean says you can do `dput(data)` and then just copy + paste the output from your terminal here as text (in your question). it will replicate the first 5 rows of your data. it looks like a big amount of "code" but it makes helping much easier – D.J Mar 02 '22 at 09:20
  • Hi guys I'm really sorry I don't even know how to do that, I go to terminal but the only thing there is Seans-MBP:Assessment seanbowers$ – Rnoob Mar 02 '22 at 09:23
  • I attached image above called dataset that should have the data I think? – Rnoob Mar 02 '22 at 09:24
  • D.J meant console instead of terminal I think – MonJeanJean Mar 02 '22 at 09:25
  • It says there are too many characters? - I'm really sorry for my lack of understanding guys, very appreciative for you all – Rnoob Mar 02 '22 at 09:28
  • I think I've put the first 5 rows in my original question now, not sure how to make it neater though? Hope it helps. Cheers – Rnoob Mar 02 '22 at 09:44
  • Hi @MonJeanJean, I think I have found a way to do it just can't figure out the exact code. I now what rows belong to each quartile of data, and wish to count the number of yes's and number of no's for each quartile. would the code be something like this? It tells me the total values of Yes but I can't figure out how to only count a certain number of rows? nrow(data[data$Attrition =="Yes", ]) nrow – Rnoob Mar 02 '22 at 22:21

1 Answers1

0

It's possible I've misunderstood what you're hoping to do, but maybe grouping by your Attrition value rather than subsetting it? You'd want to do a count of the number of values for each (I'd use group_by from tidyverse for that).

Alternatively, if there's a variable by which you could summarise the number of observations for each, you could use summarise(sum(var)) and then use that as your y value.

Then you can build your plot using something like:

ggplot(aes(fill = Attrition, x = MonthlyIncome, y = Count)) +
  geom_bar(stat="identity")
benson23
  • 16,369
  • 9
  • 19
  • 38
  • Hi Adylina, thanks for the response. I'm not to sure I understand what you're saying by grouping rather than subsetting? By summarising the number of observations for each, do you mean for example if there was 50 yes's altogether or 50 for a specific bin? I thought the Y variable would be a rough sum of total employees (past and present, accounting for attrition)? I think what your saying is definitely pointing me in the right direction, I'm just unable to interpret it due to my lack of understanding. Thanks heaps! – Rnoob Mar 02 '22 at 09:35