0

Right off the bat, I'm a newbie to R and maybe I'm misunderstanding the concept of what my code does vs what I want it to do. Here is the code I've written so far.

his <- hist(x, breaks=seq(floor(min(x)) , ceiling(max(x)) + ifelse(ceiling(max(x)) %% 5 != 0, 5, 0), 5)
Here is some sample data:
Autonr                      X
1                           -12
2                            -6
3                           -17
4                             8
5                           -11
6                           -10   
7                            10
8                           -22

I'm not able to upload one of the histograms that did work, but it should show bins of 5, no matter how large the spread of the data. The amount of bins should therefore be flexible.

The idea of the code above is to make sure that the outer ranges of my data always fall within neatly defined 5mm bins. Maybe I lost oversight. but I can't seem to understand why this does not always work. In some cases it does, but with other datasets it doesn't.

I get: some 'x' not counted; maybe 'breaks' do not span range of 'x'.

Any help would be greatly appreciated, as I don't want to have to tinker around with my breaks and bins everytime I get a new dataset to run through this.

  • Welcome to SO, to get help on this site you should include a include a portion of your data, something that can be easily copy-pasted. A picture of the histogram you currently have / want to generate helps too. Check out [this post](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) for more info – astrofunkswag Mar 19 '20 at 03:42
  • Hi, thanks for your reply. I will edit my original post to include these. – NiksAanDeWashand Mar 19 '20 at 03:45
  • Sorry, do you want 5 observations per bin, or – Gregor Thomas Mar 19 '20 at 03:58
  • I would like to have a regular histogram that counts the observations that fall within each five millimeter bin. – NiksAanDeWashand Mar 19 '20 at 04:03

2 Answers2

1

Rather than passing a vector of breaks, you can supply a single value, in this case the calculation of how many bins are needed given the range of the data and a bindwidth of 5.

# Generate uniformly distributed dummy data between 0 and 53
set.seed(5)
x <- runif(1000, 0, 53)

# Plot histogram with binwidths of 5.
hist(x, breaks = ceiling(diff(range(x)) / 5 ))

enter image description here

Ritchie Sacramento
  • 29,890
  • 4
  • 48
  • 56
  • Thanks for your reply. I have included some more info in my post. I'm not sure I understand what you mean. I am very green with R. x in my code points to a column much like the sample data I added in my original post. – NiksAanDeWashand Mar 19 '20 at 04:05
  • Run the code above using your data and see if it gives the result you expect. I'm assuming that the unit of your x variable is already mm. – Ritchie Sacramento Mar 19 '20 at 04:16
  • That's it! I have no idea what you did in your code, but it does the trick. Would it be too much to ask to have you explain it to me in another comment? I'd like to learn. – NiksAanDeWashand Mar 19 '20 at 04:25
  • It just calculates the difference between the minimum and maximum values of x and divides this by the desired binwidth. Rounded up, the result is how many bins are needed to bin the data at the desired width. This value is then passed to breaks argument of the `hist()` function which interprets a single value as n bins required. – Ritchie Sacramento Mar 19 '20 at 04:46
0

For the sake of completeness, here is another approach which uses breaks_width() from the scales package. scales is part of Hadley Wickham's ggplot2 ecosphere.

# create sample data
set.seed(5)
x <- runif(1000, 17, 53)

# plot histogram
hist(x, breaks = scales::breaks_width(5)(range(x)))

enter image description here

Explanation

scales::breaks_width(5) creates a function which is then called with the vector of minimum and maximum values of x as returned by range(x). So,

scales::breaks_width(5)(range(x))

returns a vector of breaks with the required bin size of 5:

[1] 15 20 25 30 35 40 45 50 55

The nice thing is that we have full control of the bins. E.g., we can ask for weird things like a bin size of 3 which is offset by 2:

scales::breaks_width(3, 2)(range(x))
 [1] 17 20 23 26 29 32 35 38 41 44 47 50 53 56
Community
  • 1
  • 1
Uwe
  • 41,420
  • 11
  • 90
  • 134