2

I am struggling to figure it out how to use cut() function to define interval of my data of interest by 12 months. I read this post R - Cut by Defined Interval. But it dose not help what I am looking for.

say, I have a set of data name months which have values less than a year <12 months till 50 months.

set.seed(50); sample(50) -> months

I want to use the cut() function to have the number of data which falls in each year including < 12 months.

> cut(months, breaks =  seq(12,50, by= 12))-> output
> output
 [1] (24,36] (12,24] <NA>    (36,48] (12,24] <NA>    (24,36] (24,36] <NA>    <NA>   
[11] (12,24] <NA>    (24,36] (36,48] (36,48] (36,48] (24,36] (12,24] (36,48] <NA>   
[21] (12,24] (36,48] (12,24] (12,24] <NA>    (12,24] (12,24] (24,36] <NA>    <NA>   
[31] (12,24] (36,48] (24,36] (36,48] <NA>    <NA>    (36,48] (12,24] (36,48] (24,36]
[41] (36,48] (12,24] (24,36] <NA>    <NA>    (24,36] <NA>    (24,36] (24,36] (36,48]
Levels: (12,24] (24,36] (36,48]

> table(output)
output
(12,24] (24,36] (36,48] 
     12      12      12

Questions

1- How I can get the number of data for < 12 months while I keep having the 12 months interval?

I tried this but dose not work!

> cut(months, breaks =  seq(1,12,50, by= 12))-> output

2- How I can make a hist() plot by this data?

Thanks,

Community
  • 1
  • 1
Daniel
  • 1,202
  • 2
  • 16
  • 25
  • @GGamba , Dose not work perfectly : `output (1,13] (13,25] (25,37] (37,49] 12 12 12 12` . I need to be 12 not 13! – Daniel Feb 14 '17 at 16:35
  • 2
    `seq(0, 50, by = 12)` – GGamba Feb 14 '17 at 16:36
  • 1
    seq(0, 50, by = 12) (sorry posted at same time as @Gamba). Also, what's a hiso() plot? – Patrick Williams Feb 14 '17 at 16:36
  • 1
    If you mean `hist()` plot, then you first have to label your cuts (label = c(1:4) for example). Then you could do hist(as.mumeric(output)). However, since all your bins are the same size, in this example, it's not very informative. – Patrick Williams Feb 14 '17 at 16:40
  • 1
    See my comment above. You first have to pass a label to your cut function (with a number of labels equal to your number of cuts). Then use the hist() function. For more info, look at the cut and hist function documentation. – Patrick Williams Feb 14 '17 at 16:43
  • 1
    `output <- cut(months, breaks = seq(0,50, by= 12), labels = c("<12","12-24","24-35","36-50"))` – Patrick Williams Feb 14 '17 at 16:45

2 Answers2

4
set.seed(50)
months <- sample(50)

output <- cut(months, breaks = seq(0,50, by= 12), labels = c("<12","12-24","24-35","36-50"))

hist(as.numeric(output))

You'll have to edit the axis values on the histogram manually, since they will be labeled at an interval 1-4. And as I mentioned in my comment. The histogram isn't very informative, considering all the values are equal.

Patrick Williams
  • 694
  • 8
  • 22
1

geom_col() will provide you with a clearer histogram since the data are already in a frequency table.

library(dplyr)
library(ggplot2)

set.seed(50)
months <- sample(50)

output <- cut(months, breaks = seq(0,50, by= 12), labels = c("<12","12-24","24-35","36-50"))

table(output) %>% 
  as.data.frame() %>% 
  ggplot(aes(x = output, y = Freq)) + 
  geom_col()

enter image description here

Joe
  • 3,217
  • 3
  • 21
  • 37