subset in r using last value

Question

I have a dataset like

 x <- c(FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE, TRUE, FALSE, TRUE)

and I want all the subsets with end value as true, for example:

FALSE FALSE FALSE TRUE
FALSE FALSE TRUE
FALSE FALSE FALSE TRUE
FALSE TRUE

I have tried using loops and tried getting 5 value above TRUE but due to asymmetry I am not getting desired results. I have reproduced this example where original has some more discrepancies. Any solution will be highly appreciated.

Also `trimws(strsplit(paste(x, collapse=" "), "(?<=TRUE)", perl=T)[[1]])` — Pierre L, Feb 22 '16 at 05:42

score 3 · Accepted Answer · edited May 23 '17 at 12:23

3

You can achieve what you want using only two lines of code:

splitAt <- function(x, pos) unname(split(x, cumsum(seq_along(x) %in% pos)))

> splitAt(x, which(x)+1)
[[1]]
[1] "FALSE" "FALSE" "FALSE" "TRUE" 

[[2]]
[1] "FALSE" "FALSE" "TRUE" 

[[3]]
[1] "FALSE" "FALSE" "FALSE" "TRUE" 

[[4]]
[1] "FALSE" "TRUE"

Data:

x <- c("FALSE", "FALSE", "FALSE", "TRUE",
       "FALSE", "FALSE", "TRUE",
       "FALSE", "FALSE", "FALSE", "TRUE",
       "FALSE", "TRUE")

I give credit to this great SO answer which thought of the very useful function splitAt() which I used above.

edited May 23 '17 at 12:23

Community

1
1

answered Feb 22 '16 at 05:07

Tim Biegeleisen

502,043
27
286
360

1

Variation on a theme, assuming `x <- as.logical(x)` - `split(x, sum(x) - rev(cumsum(rev(x))) )` – thelatemail Feb 22 '16 at 05:10
can i save it to data frame – R Vij Feb 24 '16 at 07:21
What structure do you want? The output you specified in your OP does not seem to lend itself well to a data frame. A data frame can easily be converted to a list (it _is_ a list), but not vice-versa. – Tim Biegeleisen Feb 24 '16 at 07:27
1

ok, thanks for the help, i have found a way to split the list and convert to dataframe using transpose function – R Vij Feb 25 '16 at 05:05

tospig · Answer 2 · 2016-02-23T00:12:44.507

This can be done in a simple lapply in one line

lapply(diff(c(0, which(x))), function(x) c(rep(FALSE, (x-1)), TRUE))

#[[1]]
#[1] FALSE FALSE FALSE  TRUE

#[[2]]
#[1] FALSE FALSE  TRUE

#[[3]]
#[1] FALSE FALSE FALSE  TRUE

#[[4]]
#[1] FALSE  TRUE

Explanation

which(x) gives us the position of the TRUE values (4, 7, 11, 13)
starting from 0, we want the difference between each TRUE (which is essentially the count of FALSE) - diff(c(0, which(x))) - 4 3 4 2
For each of these values we want a vector that is length(x), with x - 1 FALSE values, and 1 TRUE - c(rep(FALSE, (x-1)), TRUE)
the lapply does this for each of the 4 3 4 2 values, and returns a list

Benchmarking

Comparing the solutions

library(microbenchmark)

splitAt <- function(x, pos) unname(split(x, cumsum(seq_along(x) %in% pos)))

microbenchmark(

  splitAt(x, which(x)+1),

  {r <- rle(x)$lengths
  lapply(r[seq(1,length(r), by=2)] , function(x) c(rep(FALSE, x), TRUE))},

  split(x, sum(x) - rev(cumsum(rev(x))) ),

  trimws(strsplit(paste(x, collapse=" "), "(?<=TRUE)", perl=T)[[1]]),

  lapply(diff(c(0, which(x))), function(x) c(rep(FALSE, (x-1)), TRUE))

)


  #    min       lq      mean   median       uq     max neval
  # 83.827  86.3910  91.76449  88.9155  92.8350 155.722   100
  # 94.373  97.6275 105.10872 101.1455 105.8545 307.927   100
  # 85.532  88.0660  93.59524  91.7935  95.3715 126.419   100
  #145.233 147.8755 152.65975 150.3250 156.5910 177.807   100
  # 26.451  29.6130  31.81785  31.0470  33.1895  43.267   100

Data

x <- c(F, F, F, T, F , F, T, F, F, F, T, F, T)

subset in r using last value

2 Answers2