I plot this histogram using the R code hist(redwine$quality) I wanted to make it seems more like a normal distribution plot :(
Asked
Active
Viewed 209 times
-1
-
It's easier to help you if you include a simple [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input that can be used to test and verify possible solutions. – MrFlick Jul 07 '20 at 04:57
-
2Your data (or the majority of it) is not _continuous_. A histogram is therefore not the appropriate graph. – Edward Jul 07 '20 at 05:14
-
**Because your data is continuous numbers not integers. And if you use `as.integer` that rounds it down, introducing bias.** Duplicate of: [Why is the first bar so big in my R histogram?](https://stackoverflow.com/questions/43967838/why-is-the-first-bar-so-big-in-my-r-histogram) – smci Jul 07 '20 at 05:59
-
This question is perfectly clear and has no reason to be closed – David Jul 07 '20 at 06:40
2 Answers
2
The problem is that your data are not continuous. "Sturges", the default method of calculating the "optimal" number of break-points (ceiling(1 + log2(n))
), and therefore the break-points, often fails for discrete data.
vals <- 3:8
times <- c(20,100,690,650,200,30)
quality <- unlist(lapply(seq_along(vals), function(i) rep(vals[i], times=times[i])))
h1 <- hist(quality)
h1$breaks
#[1] 3.0 3.5 4.0 4.5 5.0 5.5 6.0 6.5 7.0 7.5 8.0
Solution: Specify a vector for the breaks
argument.
hist(quality, breaks=2:8)
Or use a barplot.
barplot(table(quality))

Edward
- 10,360
- 2
- 11
- 26
0
You can reduce the number of breaks
.
hist(iris$Petal.Length, breaks=4)
You can also add a curve but keep the original breaks.
hist(iris$Petal.Length, freq=FALSE)
curve(dnorm(x, mean=mean(iris$Petal.Length), sd=sd(iris$Petal.Length)), add=TRUE, col="red")

David
- 2,200
- 1
- 12
- 22