-1

I will like to group some data into several categories for boxplot in R. I obtained my groups like this:

cut(60:95, breaks = c(60,64,68,72,76,80,85,90,95))

Here's my output:

(60,64] (60,64] (60,64] (60,64] (64,68] (64,68] (64,68] (64,68] 
(68,72] (68,72] (68,72] (68,72] (72,76] (72,76] (72,76]
(72,76] (76,80] (76,80] (76,80] (76,80] (80,85] (80,85] (80,85] (80,85] 
(80,85] (85,90] (85,90] (85,90] (85,90] (85,90] (90,95]
(90,95] (90,95] (90,95] (90,95]

But the categories that I would actually like to have are:

(60,64] (60,64] (60,64] (60,64] (65,68] (65,68] (65,68] (65,68] etc

Does anyone know how I can get my desired outputs?

Rob John
  • 277
  • 1
  • 3
  • 12
  • 3
    So you do not want 65 to be within any intervall at all? Or what is the essential difference between the standard function's result and the result you want to get? – Bernhard Jan 18 '18 at 13:39
  • To add to @Bernhard's question, what about the interval `(64, 65]`? Or will the values be always integers? BTW, the first value, `60` is not an element of the first interval, so `cut` gives `NA`. If you use `include.lowest = TRUE` the problem seems to be solved, then it is just a matter of factor labels. – Rui Barradas Jan 18 '18 at 13:44
  • 3
    The symbols in `(60,64]` are not arbitrary. `(` means include the value in the bin. `]` means exclude the value in the bin. – CPak Jan 18 '18 at 13:57
  • I have a `NA` as the first element. However, it seems that you want either 64 or 65 to be in the `(65, 68]` range. If you want that, you are wrong. – nicola Jan 18 '18 at 13:58

1 Answers1

1

Your limits (inf and sup)

breaks_lim_inf<-c(60,65,69,73,77,81,86,91)
breaks_lim_sup<-c(64,68,72,76,80,85,90,95)

I build the breaks

list_int_unique<-as.factor(paste0("(",breaks_lim_inf,",",breaks_lim_sup,"]"))
list_int_unique
[1] (60,64] (65,68] (69,72] (73,76] (77,80] (81,85] (86,90] (91,95]
Levels: (60,64] (65,68] (69,72] (73,76] (77,80] (81,85] (86,90] (91,95]

I replicate breaks on the numbers between 60 and 95

list<-seq(60,95)
list_int<-list_int_unique[findInterval(list,breaks_lim_inf)]
list_int

Your Output

 [1] (60,64] (60,64] (60,64] (60,64] (60,64] (65,68] (65,68] (65,68] (65,68] (69,72] (69,72] (69,72] (69,72] (73,76] (73,76]
[16] (73,76] (73,76] (77,80] (77,80] (77,80] (77,80] (81,85] (81,85] (81,85] (81,85] (81,85] (86,90] (86,90] (86,90] (86,90]
[31] (86,90] (91,95] (91,95] (91,95] (91,95] (91,95]
Levels: (60,64] (65,68] (69,72] (73,76] (77,80] (81,85] (86,90] (91,95]

The use of "(" and "]" are unconventional, I suggest to read this SO Question to better understand the use of "(" and "]"

Terru_theTerror
  • 4,918
  • 2
  • 20
  • 39
  • While this might answer to OP's question, it should be stressed that decoding 65 as belonging to the `(65,68]` range is quite bizarre. – nicola Jan 18 '18 at 13:59
  • 1
    I'm with you, I suggest to Rob to read this https://stackoverflow.com/questions/4396290/what-does-this-square-bracket-and-parenthesis-bracket-notation-mean-first1-last to better understand the use of "(" and "]" – Terru_theTerror Jan 18 '18 at 14:06
  • Apologies for getting back late! Thanks so much @Terru_theTerror. It works as expected and I know more now about those notations. – Rob John Jan 30 '18 at 15:00