0

I am having trouble with the histogram function in R.

I have a data set with the number of individuals in each set, with values ranging from 0 to 17. I want to split them into categories (0-4,5-9,10-14,etc...) because it's fair since each category includes 5 numbers.

However, when I use the hist function in R, it automatically catorgorises into 0-5,6-10,11-15,etc... which is not what I want. I have tried "seq" and "break" in the histogram function but it didn't work well for me.

Do you have any ideas/suggestion to help manage my histogram? Do you think it's alright to go 0-4,5-9,etc or do you think R is right in handling my data?

I don't wish to split into any smaller bins as I want to conduct a chi-squared test with my data and having too many categories will result in smaller expected value.

hist(data,main="Histogram", xlab = "individuals", 
     ylab ="Count", border="black", col="red", xlim=c(0,20), ylim=c(0,10))


Set Individuals
1   2
2   5
3   9
4   6
5   17
6   2
7   13
8   6
9   0
10  1
11  2
12  1
13  2
14  2
15  15
divibisan
  • 11,659
  • 11
  • 40
  • 58
MatCode
  • 114
  • 4
  • 14
  • Take a look at these questions: [Exact number of bins in Histogram in R](https://stackoverflow.com/q/16931895/8366499) and [Binning data in R](https://stackoverflow.com/q/24359863/8366499) – divibisan Mar 20 '19 at 14:48

1 Answers1

0

You can use the breaks argument from the hist function to configure your bins. You need to provide a list with the changing values. Therefore, if you have integer and want to have 0-4, 5-9 ... You can use (with data between 0 and 10 here):

> seq(-0.5, max(data)+5, 5)
[1] -0.5  4.5  9.5 14.5

And then with any other arguments of your choosing:

hist(data, breaks=seq(-0.5,max(data)+5,5))
Oli
  • 9,766
  • 5
  • 25
  • 46