4

I have data that is ordered in classes, as described in this article: https://www.r-bloggers.com/from-continuous-to-categorical/ This makes it easier to see which values are common. After creating those classes I want to create a barchart with the frequency of the different classes, which I do with the following exemplary code:

set.seed(1)
df.v <- data.frame(val = rnorm(1000, mean(4, sd=2)))
df.v$val.clss <- cut(df.v$val, seq(min(df.v$val), max(df.v$val), 1))
p1 <- ggplot(data = df.v)+
  geom_bar(aes(val.clss))
plot(p1)

What I can not figure out, is how to add a vertical line exactly between the two bars around 4, so the line is perfectly at the x-axis value. I have found this article, but this did not help me: How to get a vertical geom_vline to an x-axis of class date? Any help is appreciated. Maybe I am too new to adapt the solution to my data.frame, if so, please excuse the question.

Community
  • 1
  • 1
Cactus
  • 864
  • 1
  • 17
  • 44

2 Answers2

6

If you know the labels for the two bars you want the line to go between, you can convert their locations to numbers (the factor that they are mapped to), then pass that:

myLoc <- 
  (which(levels(df.v$val.clss) == "(2.99,3.99]") +
     which(levels(df.v$val.clss) == "(3.99,4.99]")) / 
  2


p1 +
  geom_vline(aes(xintercept = myLoc))

If it is skipping groups, you should probably make sure that all levels of the factor are plotted. When you have binned continuous data, it is best not to drop intermediate levels.

p1 +
  geom_vline(aes(xintercept = myLoc)) +
  scale_x_discrete(drop = FALSE)

Alternatively, you could drop the missing levels from the data all together (prior to plotting and to calculating myLoc):

df.v <- droplevels(df.v)

Then it will only include the that would be plotted.

As a final option, you could just use geom_histogram which does the binning automatically, but leaves the data on the original scale, which would make adding a line easier.

ggplot(df.v
       , aes(val)) +
  geom_histogram(binwidth = 1) +
  geom_vline(xintercept = 4)
Mark Peterson
  • 9,370
  • 2
  • 25
  • 48
  • Thank you, this should work in theory. However, when I change my example code above to `cut(df.v$val, seq(min(df.v$val), max(df.v$val), 0.2))` and thus have 34 classes, and then choose 16.5 as the xintercept, it is way too far to the right side. I have no idea why, could you maybe help? Thank you so much. Well, I found out that when I use `length(levels(df.v$val.clss))`it shows me 34, but when I count the bars, I only get to 30 bars. This seems to be the root of my problem. – Cactus Oct 03 '16 at 17:35
  • 1
    That is most likely caused by categories with no values. `ggplot` drops those by default, which affects the levels on the plot. See the edit for some optional solutions. Note, however, that `droplevels` should be used with caution as it is usually *not* a good idea for continuous data. – Mark Peterson Oct 03 '16 at 18:31
  • Thanks, the `scale_x_discrete(drop = FALSE)`did the trick for automated calculation. I really appreciate the help. – Cactus Oct 04 '16 at 07:45
3

Do you want something like this?

p1 <- ggplot(data = df.v)+
  geom_bar(aes(val.clss)) + geom_vline(xintercept = 3.5, col='red', lwd=2)
plot(p1)

enter image description here

More generic solution could be this:

df.v <- data.frame(val = rnorm(1000, mean=15, sd=4))
df.v$val.clss <- cut(df.v$val, seq(min(df.v$val), max(df.v$val), 1))

lvls <- levels(df.v$val.clss)
lvls
[1] "(2.97,3.97]" "(3.97,4.97]" "(4.97,5.97]" "(5.97,6.97]" "(6.97,7.97]" "(7.97,8.97]" "(8.97,9.97]" "(9.97,11]"   "(11,12]"     "(12,13]"    
[11] "(13,14]"     "(14,15]"     "(15,16]"     "(16,17]"     "(17,18]"     "(18,19]"     "(19,20]"     "(20,21]"     "(21,22]"     "(22,23]"    
[21] "(23,24]"     "(24,25]"     "(25,26]"     "(26,27]"     "(27,28]"     "(28,29]"     "(29,30]"    

vline.level <- '(18,19]' # you want to draw line here, right before 18

p1 <- ggplot(data = df.v)+
+   geom_bar(aes(val.clss)) + geom_vline(xintercept = which(lvls == vline.level) - 0.5, col='red', lwd=2) +
+   theme(axis.text.x = element_text(angle=90, vjust = 0.5))
plot(p1)

enter image description here

If you want to choose the middlemost level,

length(lvls)
#[1] 27
# choose the middlemost level, since length(lvls) is odd in this case, the midpoint will be ceiling(length(lvls)/2)
vline.level <- lvls[ceiling(length(lvls)/2)] 

p1 <- ggplot(data = df.v)+
  geom_bar(aes(val.clss)) + geom_vline(xintercept = which(lvls == vline.level) - 0.5, col='red', lwd=2) +
  theme(axis.text.x = element_text(angle=90, vjust = 0.5))
plot(p1)

enter image description here

Sandipan Dey
  • 21,482
  • 2
  • 51
  • 63
  • Exactly. But how do I determine the value for xintercept, expecially if I have a different number of categories? Let's say I have six categories as in the example and then I have 10 categories. How do I calculate the value for xintercept? I have a case with 94 categories and the xintercept I found to work was 43.5, which is far from being the half. – Cactus Oct 03 '16 at 17:17
  • Thanks a lot . But the line is still further to the right side, right? 15 bars to the left of it and 10 bars to its right. I want it to be right in the middle, any idea? – Cactus Oct 03 '16 at 17:45
  • This is because I chose the level (18, 19], if you want to choose the middlemost level, you can do so. – Sandipan Dey Oct 03 '16 at 17:48
  • @rashid see the updated code for the vline @ the mid factor level – Sandipan Dey Oct 03 '16 at 17:56