1

I am trying to obtain the number of times that a certain numerical combination is repeated within a vector.

For example, I have this sequence of different numbers and I want to understand how many times is repeated a certain "block of number":

test <- c(1,2,3,4,1,2,3,4,4,4,4,4)

I would like to choose the length of the "block". Let's say for example length=4.

If there's some base function or solutions in R that you know, I would like to get all the possible outcomes that sould be:

2 times I found the combination 1 2 3 4 1 time I found the combination 4 4 4 4

Could you help me to fix this? I'm interested in knowing both the number of times a certain combination has been found, and also what numbers make up that combination.

I also tried to fix it with hints from Generate list of all possible combinations of elements of vector but I'm not able to obtain the results which I expect.

harre
  • 7,081
  • 2
  • 16
  • 28
cucalorda
  • 35
  • 4
  • Hi, although your explanation stricts much of context, its combinatorial interpretation is still prolific. Do you mean (?): given a sequence S and a word w, I look for how many times this word appears on S. – Bruno Peixoto Dec 01 '22 at 15:53

3 Answers3

1

Based on your desired output (and thus not all possible chunks of length n in the vector), a way could be to split vector into chunks of the desired length, convert them to strings, count the occurrences, stack them and paste to get the desired output.

You'll want to decide yourself how it should handle cases where length(test)/n is not an integer. It could throw an error or you might want to round in some way.

n <- 4

results <-
  split(test, cut(seq_along(test), length(test)/n, labels = FALSE)) |> 
  lapply(FUN = \(x) paste(x, collapse = " ")) |>
  unlist() |>
  table() |>
  stack()

paste(results$values, "times I found the combination", results$ind)

Output:

[1] "2 times I found the combination 1 2 3 4" "1 times I found the combination 4 4 4 4"
harre
  • 7,081
  • 2
  • 16
  • 28
  • Thanks a lot for the answer @harre. How could I handle cases where I don't know the block length? – cucalorda Dec 01 '22 at 16:07
  • 2
    However this solution "cuts" the list in 3 (in this case), when i interpret you,re searching in the moving window, that is, 8 possible combinations. – Ric Dec 01 '22 at 16:13
  • 1
    @cucalorda: If you don't know the block length, you'll need to loop through all possible block lengths :) We could use `seq(length(test))[12 %% seq(length(test)) == 0]` to find the block lengths that'll have solutions. – harre Dec 01 '22 at 16:20
  • @RicVillalba: You're right that the question is ambiguous, however, my proposed solution was based on the desired output. – harre Dec 01 '22 at 16:21
1
test <- c(1,2,3,4,1,2,3,4,4,4,4,4)

n<-4

each_seq <- function(x, i){
  w <-  length(x) - i + 1
  ids <- matrix(
    rep(1:i - 1, w) + rep(1:w, each = i), nrow = i)  
  do.call(rbind, apply(ids, 2, function(j) x[j], simplify = F))
}

combs <- as.data.frame(each_seq(test, n))
groups <- interaction(combs)

aggregate(1:nrow(combs), by=list(comb = groups), length)
#>      comb x
#> 1 2.3.4.1 1
#> 2 3.4.1.2 1
#> 3 4.1.2.3 1
#> 4 1.2.3.4 2
#> 5 2.3.4.4 1
#> 6 3.4.4.4 1
#> 7 4.4.4.4 2
Ric
  • 5,362
  • 1
  • 10
  • 23
0

Statement: Given sequence S and word w, find natural n respective to quantity of times w appears on S.

Definition: we define a sequence as a concatenation of elements.

Definition: we define a word as a synonym to a sequence.

Requirements: Word cardinality #w is lower than #S

Algorithm:

  1. Place word w at beginning of sequence S;
  2. Initialize counter c to 0;
  3. Verifies if it matches: if the answer is yes, increase c by 1;
  4. Move word w right by 1 element.
Bruno Peixoto
  • 211
  • 1
  • 4
  • 17