1

Is there an efficient way to split a dataframe based on identical consecutive element in a column into a list (and keep the order of the dataframe element inside the list) as follow ?

The dataframe :

X__1
S003
S003
S003
S006
S006
S011
S007
S007
S003
S003
S005
S006

Into :

$`1`
S003
S003
S003

$`2`
S006
S006

$`3`
S011

$`4`
S007
S007

$`5`
S003
S003

$`6`
S005

$`7`
S006

I tried to use : split(df, interaction(df$X__1)) but this would create groups by categories from my list as follow :

$`1`
S003
S003
S003
S003
S003

$`2`
S005

$`3`
S006
S006
S006

$`4`
S007
S007

$`6`
S011

Thanks for the help :)

Sofiane M'barki
  • 193
  • 1
  • 1
  • 11
  • Related: [Split vector into consecutive runs](https://stackoverflow.com/questions/31003568/split-vector-in-consecutive-runs). Although the first alternative by @akrun specifically deals with numeric vectors, the two last alternatives also works on non-numeric vectors (like here). – Henrik Jan 04 '18 at 15:30

2 Answers2

5

We can use the rleid function from data.table to split it, i.e.

split(df, data.table::rleid(df$X__1))
Sotos
  • 51,121
  • 6
  • 32
  • 66
2

Another way is to use cumsum.

split(df, cumsum(c(1L, df$X__1[-nrow(df)] != df$X__1[-1])))

DATA

df <-
structure(list(X__1 = c("S003", "S003", "S003", "S006", "S006", 
"S011", "S007", "S007", "S003", "S003", "S005", "S006")), .Names = "X__1", class = "data.frame", row.names = c(NA, 
-12L))
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66