-1

enter image description here

I plot this histogram using the R code hist(redwine$quality) I wanted to make it seems more like a normal distribution plot :(

Edward
  • 10,360
  • 2
  • 11
  • 26
  • It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input that can be used to test and verify possible solutions. – MrFlick Jul 07 '20 at 04:57
  • 2
    Your data (or the majority of it) is not _continuous_. A histogram is therefore not the appropriate graph. – Edward Jul 07 '20 at 05:14
  • **Because your data is continuous numbers not integers. And if you use `as.integer` that rounds it down, introducing bias.** Duplicate of: [Why is the first bar so big in my R histogram?](https://stackoverflow.com/questions/43967838/why-is-the-first-bar-so-big-in-my-r-histogram) – smci Jul 07 '20 at 05:59
  • This question is perfectly clear and has no reason to be closed – David Jul 07 '20 at 06:40

2 Answers2

2

The problem is that your data are not continuous. "Sturges", the default method of calculating the "optimal" number of break-points (ceiling(1 + log2(n))), and therefore the break-points, often fails for discrete data.

vals <- 3:8
times <- c(20,100,690,650,200,30)
quality <- unlist(lapply(seq_along(vals), function(i) rep(vals[i], times=times[i])))

h1 <- hist(quality)

enter image description here

h1$breaks
#[1] 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0

Solution: Specify a vector for the breaks argument.

hist(quality, breaks=2:8)

enter image description here

Or use a barplot.

barplot(table(quality))

enter image description here

Edward
  • 10,360
  • 2
  • 11
  • 26
0

You can reduce the number of breaks.

hist(iris$Petal.Length, breaks=4)

You can also add a curve but keep the original breaks.

hist(iris$Petal.Length, freq=FALSE)
curve(dnorm(x, mean=mean(iris$Petal.Length), sd=sd(iris$Petal.Length)), add=TRUE, col="red")
David
  • 2,200
  • 1
  • 12
  • 22