1

I have a dataset like

 x <- c(FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, TRUE, FALSE, TRUE)

and I want all the subsets with end value as true, for example:

FALSE FALSE FALSE TRUE
FALSE FALSE TRUE
FALSE FALSE FALSE TRUE
FALSE TRUE

I have tried using loops and tried getting 5 value above TRUE but due to asymmetry I am not getting desired results. I have reproduced this example where original has some more discrepancies. Any solution will be highly appreciated.

tospig
  • 7,762
  • 14
  • 40
  • 79
R Vij
  • 80
  • 1
  • 11

2 Answers2

3

You can achieve what you want using only two lines of code:

splitAt <- function(x, pos) unname(split(x, cumsum(seq_along(x) %in% pos)))

> splitAt(x, which(x)+1)
[[1]]
[1] "FALSE" "FALSE" "FALSE" "TRUE" 

[[2]]
[1] "FALSE" "FALSE" "TRUE" 

[[3]]
[1] "FALSE" "FALSE" "FALSE" "TRUE" 

[[4]]
[1] "FALSE" "TRUE" 

Data:

x <- c("FALSE", "FALSE", "FALSE", "TRUE",
       "FALSE", "FALSE", "TRUE",
       "FALSE", "FALSE", "FALSE", "TRUE",
       "FALSE", "TRUE")

I give credit to this great SO answer which thought of the very useful function splitAt() which I used above.

Community
  • 1
  • 1
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • 1
    Variation on a theme, assuming `x <- as.logical(x)` - `split(x, sum(x) - rev(cumsum(rev(x))) )` – thelatemail Feb 22 '16 at 05:10
  • can i save it to data frame – R Vij Feb 24 '16 at 07:21
  • What structure do you want? The output you specified in your OP does not seem to lend itself well to a data frame. A data frame can easily be converted to a list (it _is_ a list), but not vice-versa. – Tim Biegeleisen Feb 24 '16 at 07:27
  • 1
    ok, thanks for the help, i have found a way to split the list and convert to dataframe using transpose function – R Vij Feb 25 '16 at 05:05
1

This can be done in a simple lapply in one line

lapply(diff(c(0, which(x))), function(x) c(rep(FALSE, (x-1)), TRUE))

#[[1]]
#[1] FALSE FALSE FALSE  TRUE

#[[2]]
#[1] FALSE FALSE  TRUE

#[[3]]
#[1] FALSE FALSE FALSE  TRUE

#[[4]]
#[1] FALSE  TRUE

Explanation

  • which(x) gives us the position of the TRUE values (4, 7, 11, 13)
  • starting from 0, we want the difference between each TRUE (which is essentially the count of FALSE) - diff(c(0, which(x))) - 4 3 4 2
  • For each of these values we want a vector that is length(x), with x - 1 FALSE values, and 1 TRUE - c(rep(FALSE, (x-1)), TRUE)
  • the lapply does this for each of the 4 3 4 2 values, and returns a list

Benchmarking

Comparing the solutions

library(microbenchmark)

splitAt <- function(x, pos) unname(split(x, cumsum(seq_along(x) %in% pos)))

microbenchmark(

  splitAt(x, which(x)+1),

  {r <- rle(x)$lengths
  lapply(r[seq(1,length(r), by=2)] , function(x) c(rep(FALSE, x), TRUE))},

  split(x, sum(x) - rev(cumsum(rev(x))) ),

  trimws(strsplit(paste(x, collapse=" "), "(?<=TRUE)", perl=T)[[1]]),

  lapply(diff(c(0, which(x))), function(x) c(rep(FALSE, (x-1)), TRUE))

)


  #    min       lq      mean   median       uq     max neval
  # 83.827  86.3910  91.76449  88.9155  92.8350 155.722   100
  # 94.373  97.6275 105.10872 101.1455 105.8545 307.927   100
  # 85.532  88.0660  93.59524  91.7935  95.3715 126.419   100
  #145.233 147.8755 152.65975 150.3250 156.5910 177.807   100
  # 26.451  29.6130  31.81785  31.0470  33.1895  43.267   100

Data

x <- c(F, F, F, T, F , F, T, F, F, F, T, F, T)
tospig
  • 7,762
  • 14
  • 40
  • 79