0

Similar questions have been asked here and here. But they don't specifically help my issue.

I have a large list of numerical vectors. What I'm trying to do is check if the sequence 1, 1, 1 appears anywhere in my list. And if it does appear, change it to 1, 1, 3.

For example, if I have a list that looks like this:

myList <- list(c(1,1,1,2,3,5), 
               c(1,1,2), 
               c(1,2,3,4), 
               c(1,1,1,5,8))

We can see that the sequence 1, 1, 1 appears in myList[[1]] and myList[[4]]. Im trying to check each element of myList, find that sequence and then change the 3rd instance of the number 1 to a number 3. In this example, my desired output would look like:

[[1]]
[1] 1 1 3 2 3 5

[[2]]
[1] 1 1 2

[[3]]
[1] 1 2 3 4

[[4]]
[1] 1 1 3 5 8 

Any suggestions as to how I could achieve this?

Electrino
  • 2,636
  • 3
  • 18
  • 40
  • Is this the only pattern you need to match? Or how generalizable does the solution need to be? Is the sequence always the same number repeated multiple times? What if you have `1,1,1,1,1`? Does that become `1,1,3,1,1`? – MrFlick Feb 09 '22 at 05:41
  • This is the only pattern I have to match. In my list there can never be more than three instances of the number 1. So, `1,1,1,1` cannot exist in my list. And it will always be the sequence `1,1,1` that I want to change to `1,1,3` – Electrino Feb 09 '22 at 05:44

5 Answers5

4

Here's a bit of a hack: not efficient, but it works:

# the c(1,1,1) here is what you're looking to find ...
needle <- paste0("\\b", paste(c(1, 1, 1), collapse = ","), "\\b")
# ... and c(1,1,3) what you want to change it to
changeto <- paste(c(1, 1, 3), collapse = ",")

myList2 <- lapply(
  strsplit(sapply(myList, function(L) gsub(needle, changeto, paste(L, collapse = ","))), ","),
  as.integer)
str(myList2)
# List of 4
#  $ : int [1:6] 1 1 3 2 3 5
#  $ : int [1:3] 1 1 2
#  $ : int [1:4] 1 2 3 4
#  $ : int [1:5] 1 1 3 5 8

I say it's a bit of a hack because it first collapses the vector of numbers into a comma-separated string, then gsubs the pattern as one would single strings, then splits and re-integerizes it. Not the most efficient, but it works.

r2evans
  • 141,215
  • 6
  • 77
  • 149
2

We can use rollapplyr to successively compare against c(1, 1, 1).

library(zoo)

fun <- function(y) if (identical(y, c(1, 1, 1))) 3 else tail(y, 1)
lapply(myList, rollapplyr, 3, fun, partial = TRUE)
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
0

Completely different approach that may (under-tested) provide a more general approach to finding (and optionally replacing) subvectors in a larger vector. (Caveat emptor: I make no guarantees about "good performance" here, realize that at scale it will likely be relatively slow.)

The helper function:

#' Find (and perhaps replace) needles in haystacks
#'
#' @param haystack vector
#' @param needle vector, shorter than the length of haystack
#' @param replacement vector (optional), replace 'needle' with this
#' @param arr.ind logical, if 'replacement' is not provided, return
#'   logical (false, default) or integer (true)
#' @param fromLast logical, whether replacements should proceed from
#'   the right-side (true, default) or from the left; this is only a
#'   factor if overlapping subvectors are found
#' @return if 'replacement' is missing, then vector of logical or
#'   integer; otherwise 'haystack' optionally updated
#' @importFrom zoo rollapply
func <- function(haystack, needle, replacement, arr.ind = FALSE, fromLast = TRUE) {
  found <- zoo::rollapply(haystack, length(needle), identical, needle,
                          partial = TRUE, align = "left")
  if (missing(replacement)) {
    if (arr.ind) which(found) else found
  } else {
    needlen <- length(needle)
    Reduce(function(prev, this) {
      c(prev[seq_len(this-1)],
        replacement,
        prev[-seq_len(this + needlen - 1)])
    }, if (fromLast) rev(which(found)) else which(found), init = haystack)
  }
}

Using myList from above,

lapply(myList, func, c(1,1,1))
# [[1]]
# [1]  TRUE FALSE FALSE FALSE FALSE FALSE
# [[2]]
# [1] FALSE FALSE FALSE
# [[3]]
# [1] FALSE FALSE FALSE FALSE
# [[4]]
# [1]  TRUE FALSE FALSE FALSE FALSE

lapply(myList, func, c(1,1,1), arr.ind = TRUE)
# [[1]]
# [1] 1
# [[2]]
# integer(0)
# [[3]]
# integer(0)
# [[4]]
# [1] 1

lapply(myList, func, c(1,1,1), c(1,1,3))
# [[1]]
# [1] 1 1 3 2 3 5
# [[2]]
# [1] 1 1 2
# [[3]]
# [1] 1 2 3 4
# [[4]]
# [1] 1 1 3 5 8
r2evans
  • 141,215
  • 6
  • 77
  • 149
0

FWIW, here's a C implementation accepting any pattern and any replacement of equal length.

sig <- c(pattern = "double", replacement = "double", x = "list")
bod <- "
SEXP y;
double *py, *pp = REAL(pattern), *pr = REAL(replacement);
R_xlen_t ny, np = xlength(pattern), nr = xlength(replacement), nx = xlength(x);
size_t sp = ((size_t) np) * sizeof(double);

if (np == 0 || nr != np) {
    error(\"'pattern' and 'replacement' must have equal, nonzero length.\");
}

for (R_xlen_t i = 0, j; i < nx; ++i) {
    y = PROTECT(VECTOR_ELT(x, i));
    py = REAL(y);
    ny = xlength(y) - np + 1;
    j = 0;
    while (j < ny) {
        if (memcmp(py, pp, sp)) {
            ++j; 
            ++py;
        } else {
            memcpy(py, pr, sp);
            j += np; 
            py += np;
        }
    }
    UNPROTECT(1);
}
return x;
"
double_gsub <- inline::cfunction(sig, bod, language = "C")
x <- list(c(1, 1, 1, 2, 1, 1, 1, 4, 1, 1, 1, 8),
          c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1))

double_gsub(pattern = c(1, 1, 1), replacement = c(1, 1, 3), x = x)
## [[1]]
##  [1] 1 1 3 2 1 1 3 4 1 1 3 8
## 
## [[2]]
##  [1] 1 1 3 1 1 3 1 1 3 1 1 3
Mikael Jagan
  • 9,012
  • 2
  • 17
  • 48
0

My comment was:

@MrFlick I think the problem should be amenable to an rle based strategy. Identify the position of a triple-1 sequence and then calculate the position of the single 1 to be replaced in the original using cumsum of the lengths prior to that location and add 3. I'll see if I can cobble together a tested solution.

I agreed that a rollapply strategy would be more elegant and generalizable, but here's the requested rle approach:

for( i in seq_along(myList))  { 
    rL <-rle(myList[[i]] ) # build a separate rle for each vector
    pos <-which(rL$lengths==3 & rL$values==3) #  where length & value criteria met
        if( length(pos)  ) {  # FALSE if none found
            loc <- cumsum(rL$lengths[pos-1]+3)}else{  # add 3 to cumsum of priors
            loc <- 3}  #  nothing to add to
        myList[[i]][loc] <- 3 
    }   # will ignore list items that don't meet criteria

 myList
#---------
[[1]]
[1] 1 1 3 2 3 5

[[2]]
[1] 1 1 3

[[3]]
[1] 1 2 3 4

[[4]]
[1] 1 1 3 5 8

This solution will handle an arbitrary number of identical values in sequence and could also replace values at different locations inside that sequence, but it could not handle what r2evans is calling an inhomogenous sequence.

IRTFM
  • 258,963
  • 21
  • 364
  • 487