How can I find and replace a specific sequence of numbers in a vector in R?

Question

I need to replace the sequence "1,0,1" with "1,1,1" whenever it is found in a vector. How can I do this?

x <- c(1,2,3,4,1,0,1)

Edit: This search needs to be dynamic. If after changing from 1,0,1 to 1,1,1 another 1,0,1 occurs, this must also be replaced.

Considering:

x <- c (1,2,3,4,1,0,1,0,1,2)

I want the algorithm to do:

x <- c (1,2,3,4,1,1,1,0,1,2)

And after:

x <- c (1,2,3,4,1,1,1,1,1,2)

Messy, but you could do something like `scan(text = gsub("1, 0, 1", "1, 1, 1", toString(x)), sep = ",")`. — A5C1D2H2I1M1N2O1R2T1, Jun 14 '20 at 19:13
@Onyambu In this case I the algorithm should do: `x <- (1,2,3,4,1,0,1,0,1)` `x <- (1,2,3,4,1,1,1,0,1)` After that, it should detect the sequence 1.0.1 again `x <- (1,2,3,4,1,1,1,1,1)` — Antonio Mendes, Jun 15 '20 at 21:28

r2evans · Answer 1 · 2020-06-14T20:26:57.510

3

A function that deals dynamically with the length of the sub-vector (being sought). Solutions that convert to/from strings are going to be hugely inefficient asymptotically. Solutions that hard-code a sub-vec of length 3 are limited to sub-vecs of length 3. This deals with anything as long as the source vector is as large or larger than the sub-vec to be found.

#' Find a matching sub-vector
#'
#' Given a vector (`invec`) and a no-larger sub-vector (`subvec`),
#' determine if the latter occurs perfectly.
#' @param invec vector
#' @param subvec vector
#' @return integer positions, length 0 or more
find_subvec <- function(invec, subvec) {
  sublen <- seq_along(subvec) - 1L
  if (length(subvec) > length(invec)) return(integer(0))
  which(
    sapply(seq_len(length(invec) - length(subvec) + 1L),
           function(i) all(subvec == invec[i + sublen]))
  )
}

Use:

find_subvec(c(1,2,3,4,1,0,1), c(1,0,1))
# [1] 5
find_subvec(c(1,2,3,4,1,0,1,0,1), c(1,0,1))
# [1] 5 7

A literal replacement.

z <- c(1,1,1)
x <- c(1,2,3,4,1,0,1)
y <- c(1,0,1)
z <- c(1,1,1)
ind <- find_subvec(x, y)
for (i in ind) x[i + seq_along(y) - 1] <- z
x
# [1] 1 2 3 4 1 1 1

edited Jun 14 '20 at 20:26

answered Jun 14 '20 at 19:32

r2evans

141,215
6
77
149

3

Sidenote: for the `find_subvec`, you may have a look at [Get indexes of a vector of numbers in another vector](https://stackoverflow.com/questions/48660606/get-indexes-of-a-vector-of-numbers-in-another-vector) for some alternatives on that theme. Cheers – Henrik Jun 14 '20 at 20:16
[The approach I had considered](https://gist.github.com/mrdwab/30a0eeb0eb795a746d59bf116a24a8ed) is similar enough to yours that I don't feel it's worth posting as a separate answer. The main differences are first checking whether we need to do anything, and second, using a simpler loop rather than `sapply`, which can be a bit expensive. Note that the results aren't identical to yours (see the comment at the gist). – A5C1D2H2I1M1N2O1R2T1 Jun 14 '20 at 20:36
One of the key differences that you're highlighting is an assumption of *"conditions before any change"* versus *"check the condition after each change"*. The latter is a little more work, perhaps. I don't think that either one is categorically better, since it depends wholly on the intent of the OP. As for `sapply` being faster ... confirmed and interesting! For years there was such the *opposite* presumption! – r2evans Jun 14 '20 at 21:07
Switching to `vapply` cuts the performance margin significantly, but it's still higher. (*head-scratching* ...) – r2evans Jun 14 '20 at 21:10
`vapply` should generally be faster than `sapply` if only because `sapply` looks at whether the results can be simplified, whereas with `vapply` you're providing the template you want. Agreed that without the OP's input on checking conditions before or after the change, we're sort of stabbing in the dark here. – A5C1D2H2I1M1N2O1R2T1 Jun 14 '20 at 21:48

tmfmnk · Answer 2 · 2020-06-14T21:30:00.143

1

There could be edge cases as mentioned by @Onyambu when the expected results are not clear, but one option could be:

x + (x == 0 & c(NA, head(x, -1)) == 1 & c(tail(x, -1), NA) == 1)

1] 1 2 3 4 1 1 1

Here, it is not treating x as a string, but it is assessing whether the lag and lead values are 1 and the value in the middle is 0.

edited Jun 14 '20 at 21:30

answered Jun 14 '20 at 19:36

tmfmnk

38,881
4
47
67

score -1 · Answer 3 · answered Jun 14 '20 at 19:36

-1

This should work well enough

library(tidyverse)

x <- c(1,2,3,4,1,0,1,0,1)

x %>% 
  reduce(str_c) %>% 
  str_replace_all("(?<=1)0(?=1)","1")
#> [1] "123411111"

^{Created on 2020-06-14 by the reprex package (v0.3.0)}

answered Jun 14 '20 at 19:36

Bruno

4,109
1
9
27

How can I find and replace a specific sequence of numbers in a vector in R?

3 Answers3