19

I'm having trouble making a histogram in R. The problem is that I tell it to make 5 bins but it makes 4 and I tell to make 5 and it makes 8 of them.

data <- c(5.28, 14.64, 37.25, 78.9, 44.92, 8.96, 19.22, 34.81, 33.89, 24.28, 6.5, 4.32, 2.77, 17.6, 33.26, 52.78, 5.98, 22.48, 20.11, 65.74, 35.73, 56.95, 30.61, 29.82);

hist(data, nclass = 5,freq=FALSE,col="orange",main="Histogram",xlab="x",ylab="f(x)",yaxs="i",xaxs="i")

Any ideas on how to fix it?

Serenthia
  • 1,222
  • 4
  • 22
  • 40
Eduardo
  • 433
  • 1
  • 4
  • 10

5 Answers5

25

Use the breaks argument:

hist(data, breaks=seq(0,80,l=6),
       freq=FALSE,col="orange",main="Histogram",
       xlab="x",ylab="f(x)",yaxs="i",xaxs="i")

enter image description here

Rob Hyndman
  • 30,301
  • 7
  • 73
  • 85
14

The integer specified as argument for nclass is used as a suggestion:

the number is a suggestion only

An alternative solution is to cut your vector into a specified number of groups and plot the result:

plot(cut(data, breaks = 4))

enter image description here

Sven Hohenstein
  • 80,497
  • 17
  • 145
  • 168
12

Building on the answer from Rob Hyndman:

Maybe a more generic solution would be to make the breaks considering the minimun and maximun values of the data, and the number of breaks = number_of_bins+1.

hist(data,breaks=seq(min(data),max(data),l=number_of_bins+1), 
     freq=FALSE,col="orange",
     main="Histogram",xlab="x",ylab="f(x)",yaxs="i",xaxs="i")
Matthew Lundberg
  • 42,009
  • 6
  • 90
  • 112
jlainoc
  • 121
  • 1
  • 3
2

If you are not opposed to using something other than base graphics, there is always the ggplot2 way of doing things:

library(ggplot2)

data <- data.frame(x=data)

    ggplot(data, aes(x=x))+
      geom_histogram(binwidth=18,color="black", fill="grey")+
      scale_x_continuous(breaks=c(0,20,40,60,80)

ggplot2 has great documentation at: https://ggplot2.tidyverse.org/

For histogram specific examples: https://ggplot2.tidyverse.org/reference/geom_histogram.html

Community
  • 1
  • 1
Sardimus
  • 156
  • 8
2

I like to be quite accurate about my data points:

hist(data,breaks = seq(min(data),max(data),by=((max(data) - min(data))/(length(data)-1))))

This should automate the process with little manual input.

mjp
  • 215
  • 2
  • 11