2

I've tried different functions and several different arguments but the histogram

hist(estimator, probability=T, br=5)

isn't showing densities on the y-axis no matter what plotting function or argument I choose. The vector "estimator" contains 100 values around between 0.4 and 0.6.

To be precise: the way by creating an hist object, then calculate and change the densities and plot it again by plot() works, but I don't want the hist object to be plotted in the first place :/ Plot

Zong
  • 6,160
  • 5
  • 32
  • 46
Martin Schmelzer
  • 23,283
  • 6
  • 73
  • 98
  • What is it showing instead? When I run: `estimator=rnorm(100); hist(estimator, probability=TRUE, br=5)` The y-axis looks like density to me! – Justin Nov 28 '12 at 22:41
  • 1
    I'm guessing it is showing a density but you're under the mistaken assumption that a density can't be greater than 1? – Dason Nov 28 '12 at 22:43
  • i added the hist i get when i use just the command above. i could go the way i described above by saving the hist as an object, adjust the densities and plot it again. but then 2 plots are drawn. thats not what i want. – Martin Schmelzer Nov 28 '12 at 22:47
  • I don't think I like what you're doing but if you just want to suppress the actual plotting of the histogram then you can use `plot=FALSE` as a parameter. This is in `?hist` – Dason Nov 28 '12 at 22:56

1 Answers1

8

When you specify probability=T (or better yet probability=TRUE so that you don't get messed up if T is changed to something besides TRUE) is a scaling such that the entire area of the histogram bars add to 1, since the width of your bars is quite a bit less than 1 the heights need to be greater than 1 so that the areas all add to 1. This makes it easy to superpose a density estimate curve or a theoretical density curve or add other references.

In general you should just ignore the tick labels on the y-axis (it would be better if they were not even plotted), they just distract from the important parts of the plot.

Many people think they want the y-axis tick labels to represent the proportion (or percentage) of observations within each grouping (and that is possible with your own custom axis), but I think this is still a distraction. Consider what happens if you change the number of bars/intervals in the histogram, the overall structure of the histogram stays the same (provided you don't make to drastic a change), but the tick labels on the y-axis change, sometimes by quite a bit, so they are better ignored (or not produced in the first place).

If you really think that the percentages (or proportions) are needed then the code is as simple as:

x <- rgamma(327, 5, 3)

tmp <- hist(x, yaxt='n',ylab='Percent')
tmp2 <- pretty( tmp$counts/sum(tmp$counts)*100 )
axis(2, at=tmp2*sum(tmp$counts)/100, labels=tmp2)

That could be easily wrapped into a function if you wanted.

Greg Snow
  • 48,497
  • 6
  • 83
  • 110
  • 1
    Thats what ive read already :/. I just wondered why there isnt such an "in-built" solution to the hist() command to plot real shares on the y-axis. since people can still decide whether they find them distracting or not it would be nice to have such control. Thanks anyways! – Martin Schmelzer Nov 28 '12 at 22:57
  • 1
    @MartinDabbelJuSmelter, I added an example of how to do it yourself if you want. You could even wrap that into a function. But I think that it should stay requiring a little extra effort so that it is only used by those who have really thought it through. – Greg Snow Nov 29 '12 at 16:51
  • 'probability' is an alias for '!freq' – andreipb Sep 24 '21 at 13:42