Smallest sub-vector with all distinct values of the original vector

Question

Let's say we have a vector of length n with k distinct values.

{1, 4, 2, 2, 3, 4, 1, 1, 5, 2, 2}

How do I find out the start and end coordinates (on the original vector) of the smallest subset of values, sequence and frequency of individual elements conserved, that has all the distinct values of the original vector?

For our example, the subset would be

{2, 3, 4, 1, 1, 5}

and the start and end coordinates would be 4 and 9, respectively.

Dupe-oid: [Get indexes of a vector of numbers in another vector](https://stackoverflow.com/questions/48660606/get-indexes-of-a-vector-of-numbers-in-another-vector) — Henrik, Nov 15 '18 at 10:42
Since this is R and you took the time to make a numeric sequence why `{}` vs `c()`? Spidey-sense says `#homework` — hrbrmstr, Nov 15 '18 at 10:44
Haha no, I just used the sets notation as a matter of habit. I made up the sequence while writing the question and didn't copy it. — Sachin, Nov 15 '18 at 17:30

niko · Accepted Answer · 2018-11-15T11:29:37.453

Here is something that will do this task: First I create a vector index where index[k] is equal to the amount of indices to go (starting at k) until one has all the elements at least once, and it is equal to Inf if that is never the case.

# determining the unique elements of v
uniqueVals <- unique(v)
index <- numeric(length(v))

# helper function
myFun <- function(k){
   helper <- vapply(seq(k, length(v)), 
                    function(s) length(setdiff(uniqueVals, v[k:s])), 
                    numeric(1))
   return (ifelse(min(helper) == 0, which.min(helper)-1, Inf))
}

# indices in seq1 must be infinity as there are not enough values left
seq1 <- which(length(v) - seq_along(v) < length(uniqueVals))
index[seq1] <- Inf
# for the other indices we now use our helper function
index[seq(1, min(seq1)-1)] <- vapply(seq(1, min(seq1)-1), myFun, numeric(1)) 

# applying the above
startIndex <- which.min(index) 
endIndex <- index[startIndex] + startIndex
v[startIndex:endIndex]
# yielding
[1] 2 3 4 1 1 5

where v = c(1, 4, 2, 2, 3, 4, 1, 1, 5, 2, 2) and for any given k myFun will return the smallest number n such that v[k:n] contains every element of v.

Smallest sub-vector with all distinct values of the original vector

1 Answers1