0

I am making a histogram of data with ggplot2. I am using facet_grid with two discrete variables, all combinations of which exist in the data. I would like to show percentages on the y-axis (in this particular case this is what is required, even though there are good arguments for using density). I initially posted this question but have added additional variables and changed my approach, and I would now prefer to use ggplot2.

I have binned the data outside ggplot2 and used geom_bar. This is the approach described in the second answer to the following question - i.e. the answer by the OP, Feng Mai:

Let ggplot2 histogram show classwise percentages on y axis

(I also tried the approach described in the accepted answer to the above question, which is by Rorschach. However, ns for each graph still appeared to be calculated based on the total number of participants rather than classwise. For now I have made greater headway with binning).

My problem is that I end up with unexpected results on the x-axis (see reprex below):

  1. The x-axis is overwritten several times. I think this is because an x-axis is being produced for every level of the variable Profile, but I am not sure how to stop this happening.

  2. The x-axis shows the brackets produced by the cut function. Perhaps this is related to the first problem, perhaps not. EDIT: I was expecting an x-axis showing 1-100 underneath each column, rather than an x-axis showing bins with brackets. User Feng Mai seems to have achieved this in the question that I linked above.

I would be very grateful for advice on fixing these problems.

  library(tidyverse)
  library(ggplot2)
  library(scales)

  
d <-  data.frame(
              n = c(1L,7L,8L,17L,31L,24L,13L,16L,
                    14L,14L,17L,19L,14L,4L,1L,2L,15L,18L,13L,14L,17L,
                    6L,10L,4L,3L,1L,3L,5L,6L,22L,24L,21L,10L,4L,
                    6L,5L,8L,29L,17L,34L,8L,9L,3L,5L,19L,16L,20L,4L,
                    1L,9L,14L,35L,1L,3L,5L,32L,33L,17L,44L,23L,11L,
                    9L,4L,4L,4L,5L,11L,17L,14L,16L,11L,6L,3L,2L,
                    15L,32L,42L),
           freq = c(0.0099009900990099,
                    0.0693069306930693,0.0792079207920792,0.168316831683168,0.306930693069307,
                    0.237623762376238,0.128712871287129,0.158415841584158,
                    0.138613861386139,0.138613861386139,0.168316831683168,
                    0.188118811881188,0.138613861386139,0.0396039603960396,
                    0.0099009900990099,0.0198019801980198,0.148514851485149,
                    0.178217821782178,0.128712871287129,0.138613861386139,
                    0.168316831683168,0.0594059405940594,0.099009900990099,
                    0.0396039603960396,0.0297029702970297,0.0099009900990099,
                    0.0297029702970297,0.0495049504950495,0.0594059405940594,
                    0.217821782178218,0.237623762376238,0.207920792079208,
                    0.099009900990099,0.0396039603960396,0.0594059405940594,
                    0.0847457627118644,0.135593220338983,0.491525423728814,0.288135593220339,
                    0.576271186440678,0.135593220338983,0.152542372881356,
                    0.0508474576271186,0.0847457627118644,0.322033898305085,
                    0.271186440677966,0.338983050847458,0.0677966101694915,
                    0.0169491525423729,0.152542372881356,0.23728813559322,
                    0.593220338983051,0.010989010989011,0.032967032967033,
                    0.0549450549450549,0.351648351648352,0.362637362637363,
                    0.186813186813187,0.483516483516484,0.252747252747253,0.120879120879121,
                    0.0989010989010989,0.043956043956044,0.043956043956044,
                    0.043956043956044,0.0549450549450549,0.120879120879121,
                    0.186813186813187,0.153846153846154,0.175824175824176,
                    0.120879120879121,0.0659340659340659,0.032967032967033,
                    0.021978021978022,0.164835164835165,0.351648351648352,
                    0.461538461538462),
        Profile = as.factor(c("1","1","1",
                              "1","1","1","1","1","1","1","1","1","1",
                              "1","1","1","1","1","1","1","1","1","1",
                              "1","1","1","1","1","1","1","1","1","1",
                              "1","1","2","2","2","2","2","2","2","2",
                              "2","2","2","2","2","2","2","2","2","3",
                              "3","3","3","3","3","3","3","3","3","3",
                              "3","3","3","3","3","3","3","3","3","3",
                              "3","3","3","3")),
     TimeFactor = as.factor(c("FactorScoreO",
                              "FactorScoreO","FactorScoreO","FactorScoreO",
                              "FactorScoreO","FactorScoreO","FactorScoreO",
                              "FactorScoreM","FactorScoreM","FactorScoreM",
                              "FactorScoreM","FactorScoreM","FactorScoreM",
                              "FactorScoreM","FactorScoreM","FactorScoreM",
                              "FactorScoreP","FactorScoreP","FactorScoreP",
                              "FactorScoreP","FactorScoreP","FactorScoreP","FactorScoreP",
                              "FactorScoreP","FactorScoreP","FactorScoreP",
                              "FactorScoreD","FactorScoreD","FactorScoreD",
                              "FactorScoreD","FactorScoreD","FactorScoreD",
                              "FactorScoreD","FactorScoreD","FactorScoreD",
                              "FactorScoreO","FactorScoreO","FactorScoreO",
                              "FactorScoreO","FactorScoreM","FactorScoreM",
                              "FactorScoreM","FactorScoreM","FactorScoreM",
                              "FactorScoreP","FactorScoreP","FactorScoreP","FactorScoreP",
                              "FactorScoreD","FactorScoreD","FactorScoreD",
                              "FactorScoreD","FactorScoreO","FactorScoreO",
                              "FactorScoreO","FactorScoreO","FactorScoreO",
                              "FactorScoreO","FactorScoreM","FactorScoreM",
                              "FactorScoreM","FactorScoreM","FactorScoreM",
                              "FactorScoreP","FactorScoreP","FactorScoreP",
                              "FactorScoreP","FactorScoreP","FactorScoreP",
                              "FactorScoreP","FactorScoreP","FactorScoreP","FactorScoreP",
                              "FactorScoreD","FactorScoreD","FactorScoreD",
                              "FactorScoreD")),
      score_cut = as.factor(c("(20,30]",
                              "(40,50]","(50,60]","(60,70]","(70,80]","(80,90]",
                              "(90,100]","[0,10]","(10,20]","(20,30]",
                              "(30,40]","(40,50]","(50,60]","(60,70]","(70,80]",
                              "(80,90]","[0,10]","(10,20]","(20,30]","(30,40]",
                              "(40,50]","(50,60]","(60,70]","(70,80]",
                              "(80,90]","(90,100]","(10,20]","(20,30]","(30,40]",
                              "(40,50]","(50,60]","(60,70]","(70,80]",
                              "(80,90]","(90,100]","(60,70]","(70,80]","(80,90]",
                              "(90,100]","[0,10]","(10,20]","(20,30]",
                              "(30,40]","(40,50]","[0,10]","(10,20]","(20,30]",
                              "(30,40]","(60,70]","(70,80]","(80,90]",
                              "(90,100]","(40,50]","(50,60]","(60,70]","(70,80]",
                              "(80,90]","(90,100]","[0,10]","(10,20]",
                              "(20,30]","(30,40]","(40,50]","[0,10]","(10,20]",
                              "(20,30]","(30,40]","(40,50]","(50,60]",
                              "(60,70]","(70,80]","(80,90]","(90,100]","(60,70]",
                              "(70,80]","(80,90]","(90,100]"))
   )
  

  hist <- ggplot(d, aes(x = score_cut, y = freq*100, fill=Profile)) +
    geom_bar (stat="identity", position="dodge") +
    xlab("Raw unweighted factor score") + ylab("Percentage of participants") +
    theme_bw()+
    facet_grid(vars(TimeFactor), vars(Profile)) +
    theme(panel.grid.major = element_blank()) +
    scale_fill_manual(values=c("#56B4E9", "#E69F00", "#999999")) 
  
  hist

Created on 2021-04-30 by the reprex package (v0.3.0)

  • What is your expected x-axis? do you want every bin to apear only once? – Ran K Apr 30 '21 at 10:29
  • @RanK I was expecting an x-axis showing 1-100 underneath each column, rather than bins being shown with brackets. I'll update my question to make this clear. – user15545413 Apr 30 '21 at 11:52
  • if you're creating the cuts yourself, use the label argument in cut. check `?cut` Also, please kindly consider asking much shorter questions, I am quite inclined to give a downvote, but it's Friday. – tjebo Apr 30 '21 at 12:18
  • @tjebo, thanks very much for the suggestion, I will check the label argument in cut. Sorry about the length of the question - I am fairly new here - can you help me for next time by telling me which parts I should have cut? The references to the other questions? – user15545413 Apr 30 '21 at 13:07
  • sure thing! (that's another reason why I didn't downvote - because you're new here !) The main thing is that You're creating a lot of code for a question that is essentially a "how to label cuts" question - this could be boiled down to two values only. It is also irrelevant if these are bars or histogram or anything else - it's about being a discrete variable. You also generally don't need to give that much background. What you have done, and this is very useful, to let us know what you have tried before (i.e., either link to similar questions, or show code attempts, or both). – tjebo Apr 30 '21 at 13:11
  • 1
    @tjebo thanks, that's very helpful. It's tempting to make a big reprex because it looks 'closer' to the result using the full data, so *feels* easier to get a sense of whether it's working - but this may not actually be the case! I take your point and will try to cut it back next time. I think the same goes for histograms, bars, etc. - throwing in kitchen sink 'just in case' someone says - ah, if it's a histogram here is this completely different solution! – user15545413 Apr 30 '21 at 13:21

0 Answers0