-1

This sounds simple, but having a hard time figuring it out. I have a dataframe (S) with one column populated with numeric months (1-12 i.e Jan-Dec):

S$month
 [1]  6  7 12  1  2  3  4  5  5  6  7  8  9 10 11 12  1  2  3  4  5  6  7  8  9 10
[27] 11 12  2  3  4  6 10 11 12  1  2  3  5  6  7  7 

I'd like to split the dataframe into a list as such consecutive months are grouped as shown below:

S[[1]]$month
[1]  6  7
S[[2]]$month
[1]  12  1  2  3  4  5  5  6  7  8  9 10 11 12  1  2  3  4  5  6  7  8  9 10
[25] 11 12
S[[3]]$month
[1] 2  3  4
S[[4]]$month
[1] 6
S[[5]]$month
[1] 10 11 12  1  2  3
S[[6]]$month
[1] 5  6  7  7

Note that some months are repetitive because more than one measurement was taken.

Is there any easy way to do it other than writing a lot like: S[[1]]<-S[c(1:2),]; S[[2]]<-S[c(3:28),]; and so on ...?? because that's quite inefficient!

ToNoY
  • 1,358
  • 2
  • 22
  • 43

2 Answers2

3

You can use cumsum and diff to create a group variable and use the split function to turn your vector into a list of consecutive months:

split(month, cumsum(!c(1, diff(month)) %in% c(0, 1, -11)))
# by using c(0, 1, -11), (12, 1) which is the only consecutive case which can have diff of 
# -11 and consecutive same months are also considered as legitimate consecutive order.

# $`0`
# [1] 6 7

# $`1`
# [1] 12  1  2  3  4  5  5  6  7  8  9 10 11 12  1  2  3  4  5  6  7  8  9 10 11 12

# $`2`
# [1] 2 3 4

# $`3`
# [1] 6

# $`4`
# [1] 10 11 12  1  2  3

# $`5`
# [1] 5 6 7 7
Psidom
  • 209,562
  • 33
  • 339
  • 356
0

We can do this programmatically and not rely on the output from the diff.

with(S, split(month, cumsum(c(TRUE, diff(cumsum(c(FALSE, 
         (month==12)[-length(month)]))*12 + month)>1))))
#$`1`
#[1] 6 7

#$`2`
#[1] 12  1  2  3  4  5  5  6  7  8  9 10 11 12  1  2  3  4  5  6  7  8  9 10 11 12

#$`3`
#[1] 2 3 4

#$`4`
#[1] 6

#$`5`
#[1] 10 11 12  1  2  3

#$`6`
#[1] 5 6 7 7

data

S <- structure(list(month = c(6, 7, 12, 1, 2, 3, 4, 5, 5, 6, 7, 8, 
9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 2, 3, 4, 
6, 10, 11, 12, 1, 2, 3, 5, 6, 7, 7)), .Names = "month", row.names = c(NA, 
-42L), class = "data.frame")
akrun
  • 874,273
  • 37
  • 540
  • 662