Histogram with variable breaks

Question

I'm working on my Masters dissertation and need help with the programming part.

I want to generate a histogram which plots the density on the amount of shares bought/ sold by corporate insiders.

The problem is that the variable "Amount" is very broad and has extreme values of 2,589,704. These values are way higher than the mean of 38,000 and the median of 900. The min is 1.

Therefore I want to generate a histogram that has variable breaks.

My code looks like this:

hist(myInside$Amount,
 breaks=c(min(myInside$Amount), seq(1000, 10000, 1000), max(myInside$Amount)),
 xlab="Amounts of shares bought/ sold",
 xlim=c(1,2589704),
 col="blue",
 freq=FALSE

)

The result looks like this:

Histogram

There is only a tiny line close to the zero in the left corner. The rest is empty and I simply do not know why.

Does anybody has an improvement so that the classes of the histogram do match the data properly? I wanted something like 11 classes from 1 to 10,000 because most of the data is in this range and the rest should be aggregated in the last class, so that everything higher than 10,000 is in the last class.

Thanks a lot for your help everybody.

Possible duplicate of [R - Cut by Defined Interval](https://stackoverflow.com/questions/5746544/r-cut-by-defined-interval) — tjebo, Aug 05 '18 at 13:20
Maybe `breaks = c(seq(min(myInside$Amount), 10000, 1000), max(myInside$Amount))`. — Rui Barradas, Aug 05 '18 at 13:21

score 0 · Answer 1 · answered Aug 06 '18 at 14:21

As suggested by Tjebo, you could cut your data in intervals first:

# normal data:
myInside <- data.frame(Amount = c(rnorm(1000, 5000, 1000), 250000))
# transform data into numbered intervals:
myInside$Transform <- as.numeric(cut(myInside$Amount, 
                                  breaks = c(seq(0, 10000, by = 1000), 
                                             max(myInside$Amount))))

Calling

hist(myInside$Transform,
 breaks = 11,
 xlab = "Classes of insider trades sizes",
 col = "blue",
 freq = FALSE)

then gives you:

However, as you see, it is hard to interpret the histogram now. Even if you specify what the classes are, it is still a little obscure. Maybe reframing your data in terms of dollar values instead of number of shares might help (that would also increase the meaningfulness of your data).

Histogram with variable breaks

1 Answers1