2

I have a vector of length "l", which must consist of a strictly specified, looped sequence, for example "1 2 3"

x <- c(1,2,3,1,2,3,1,2,3,1,3,1,2,3,1,2,3,1,2,3,1,2,3,2,3,1,2,3,1,2,3,1,2,3,1,2,3)

but it contains gaps, such as: "1 3" or "2 3" or other similar variants of the sequence violation, how can i find errors in sequence and remove those incomplete sequences?

Phil
  • 7,287
  • 3
  • 36
  • 66
  • This should help: https://stackoverflow.com/questions/48660606/get-indexes-of-a-vector-of-numbers-in-another-vector/48708439#48708439 – jblood94 Mar 12 '21 at 18:20

3 Answers3

1

You can remove your "gaps" by looping through x. Modify seq in the following code to scan for any sequence you like.

x <- c(1,2,3,1,2,3,1,2,3,1,3,1,2,3,1,2,3,1,2,3,1,2,3,2,3,1,2,3,1,2,3,1,2,3,1,2,3)
seq = c(1,2,3)

l = 1
while(TRUE){
  ### check if a new sequence starts
  if(x[l]==seq[1]){
    counter = 0
    ### count elements of the sequence candidate
    while(TRUE){
      ### break if new sequence starts or if vector is completly scanned
      if(x[l+counter+1] %in% c(1,NA)) break
      counter = counter + 1
    }
    ### remove current sequence if not identical to seq
    if(!identical(x[l:(l+counter)],seq)){
      x = x[-(l:(l+counter))]
      l = l - counter
      next
    }
  }
  l = l + 1
  ### finish if vector completly scanned
  if(l > length(x)) break
}
1

Here is one base R option using gregexpr

v <- c(1, 2, 3)
res <- rep(v, length(unlist(gregexpr(toString(v), toString(x)))))

which gives

> res
 [1] 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3
ThomasIsCoding
  • 96,636
  • 9
  • 24
  • 81
0

I did this and it seemed to work. The idea is to stock all the wrong pattern in a vector and then remove them

data <-c(1,2,3,1,2,3,1,2,3,1,3,1,2,3,1,2,3,1,2,3,1,2,3,2,3,1,2,3,1,2,3,1,2,3,1,2,3)
b = c()
a= 1
for(i in 1: length(data)){

  if(i>1&data[i]!=1){
    
    if(data[i]!=a+1){
      b = c(b,c(i,(i-1)))
      a = data[i]
    }
    if(data[i]==a+1){
      a = data[i]
    }
  }
  if(data[i]==1){
    a = data[i]
  }
}          
data = data[-b]
elielink
  • 1,174
  • 1
  • 10
  • 22