15

I need to add a mean line and the value of the mode for example to this kinds of plots:

I use this for calculate the number of bins:

bw <- diff(range(cars$lenght)) / (2 * IQR(cars$lenght) / length(cars$lenght)^(1/3))

And the plot:

ggplot(data=cars, aes(cars$lenght)) + 
  geom_histogram(aes(y =..density..), 
                 col="red",
                 binwidth = bw,
                 fill="green", 
                 alpha=1) + 
  geom_density(col=4) + 
  labs(title='Lenght Plot', x='Lenght', y='Times')

cars$lenght

168.8 168.8 171.2 176.6 176.6 177.3 192.7 192.7 192.7 178.2 176.8 176.8 176.8 176.8 189.0 189.0 193.8 197.0 141.1 155.9 158.8 157.3 157.3 157.3 157.3 157.3 157.3 157.3 174.6 173.2

Thanks in advance.

neilfws
  • 32,751
  • 5
  • 50
  • 63
Borja_042
  • 1,071
  • 1
  • 14
  • 26

2 Answers2

21

I'm not sure how to replicate your data, so I used cars$speed in its place.

geom_vline will place vertical lines where you want, and you can calculate the mean and mode of the raw data on the fly. But if you want the mode as the histogram bin with the highest frequency, you can extract that from the ggplot object.

I'm not too sure how you want to define mode, so i plotted a bunch of different approaches.

# function to calculate mode
fun.mode<-function(x){as.numeric(names(sort(-table(x)))[1])}

bw <- diff(range(cars$length)) / (2 * IQR(cars$speed) / length(cars$speed)^(1/3))
p<-ggplot(data=cars, aes(cars$speed)) + 
  geom_histogram(aes(y =..density..), 
                 col="red",
                 binwidth = bw,
                 fill="green", 
                 alpha=1) + 
  geom_density(col=4) + 
  labs(title='Lenght Plot', x='Lenght', y='Times')

# Extract data for the histogram and density peaks
data<-ggplot_build(p)$data
hist_peak<-data[[1]]%>%filter(y==max(y))%>%.$x
dens_peak<-data[[2]]%>%filter(y==max(y))%>%.$x

# plot mean, mode, histogram peak and density peak
p%+%
  geom_vline(aes(xintercept = mean(speed)),col='red',size=2)+
  geom_vline(aes(xintercept = fun.mode(speed)),col='blue',size=2)+
  geom_vline(aes(xintercept = hist_peak),col='orange',size=2)+
  geom_vline(aes(xintercept = dens_peak),col='purple',size=2)+
  geom_text(aes(label=round(hist_peak,1),y=0,x=hist_peak),
            vjust=-1,col='orange',size=5)

enter image description here

dule arnaux
  • 3,500
  • 2
  • 14
  • 21
  • Got the mode function from an answer here: https://stackoverflow.com/questions/2547402/is-there-a-built-in-function-for-finding-the-mode – dule arnaux Nov 01 '17 at 01:33
  • Hi @dulearnaux would appreciate if you help me understand how to plot values on the line (mean,median&mode) & how to put these into a legend. – BPDESILVA May 07 '19 at 22:36
1

Create a data.frame which has a value for each statistic you want to plot. This has the advantage of automatically creating a legend for each statistic.

cars$length <- cars$speed
bw <- diff(range(cars$length)) / (2 * IQR(cars$length) / length(cars$length)^(1/3))

sumstatz <- data.frame(whichstat = c("mean",
                                     "sd upr", 
                                     "sd lwr"),
                       value     = c(mean(cars$length),
                                     mean(cars$length)+sd(cars$length),
                                     mean(cars$length)-sd(cars$length)))

ggplot(data=cars, aes(length)) + 
  geom_histogram(aes(y =..density..),
                 col="black",
                 binwidth = bw) + 
  geom_density(col="black") + 
  geom_vline(data=sumstatz,aes(xintercept = value,
                               linetype = whichstat,
                               col = whichstat),size=1)+
  labs(title='Length Plot', x='Length', y='Count')

enter image description here

Richard N. Belcher
  • 154
  • 1
  • 2
  • 8