I am trying write a function or use cut
to assign a grouping variable to some date data when those dates are close (user definition of close). For example, I would like to create a common grouping variable for some samples that were collected on consecutive dates. I was thinking cut
would work here but then I realized cut
doesn't group variables when they are close and rather creates a series of groups based on a sequence.
So take this dataframe for example:
df <- structure(list(Num = c(0.888401849195361, 0.185766335576773,
0.493163562379777, 0.13070688676089, 0.484760325402021, 0.603240836178884,
0.893201333936304, 0.641203448642045, 0.16957180458121, 0.0101411847863346
), Date = structure(c(10592, 10597, 10598, 10605, 10606, 10608,
10609, 10616, 10617, 10618), class = "Date"), day = c(1L, 6L,
7L, 14L, 15L, 17L, 18L, 25L, 26L, 27L)), .Names = c("Num", "Date",
"day"), row.names = c(NA, -10L), class = "data.frame")
If was to apply a cut function as I understand its usage like this:
df$cutVar <- cut(df$day, breaks= seq(0, 31, by = 1), right=TRUE)
I would be left with a range that went right through values that I'd prefer to be grouped together. For example, the 6th and 7th should be grouped together based on their proximity to each other. Similar to 14th and 15th and so on.
> df
Num Date day cutVar
1 0.88840185 1999-01-01 1 (0,1]
2 0.18576634 1999-01-06 6 (5,6]
3 0.49316356 1999-01-07 7 (6,7]
4 0.13070689 1999-01-14 14 (13,14]
5 0.48476033 1999-01-15 15 (14,15]
6 0.60324084 1999-01-17 17 (16,17]
7 0.89320133 1999-01-18 18 (17,18]
8 0.64120345 1999-01-25 25 (24,25]
9 0.16957180 1999-01-26 26 (25,26]
10 0.01014118 1999-01-27 27 (26,27]
So the basic question here is how to group a continuous variable (a date in this instance) such that close (defined by the user) numbers are grouped together in a factor range?