13

In julia we can check if an array contains a value, like so:

> 6 in [4,6,5]
true

However this returns false, when attempting to check for a sub-array in a specific order:

> [4,6] in [4,6,5]
false

What is the correct syntax to verify if a specific sub-array exists in an array?

Nicky Feller
  • 3,539
  • 9
  • 37
  • 54
  • The second result in the question does not match its description. It is a tuple of `4` and the first result. – Dan Getz Apr 01 '16 at 00:23
  • Package [Iterators.jl](https://github.com/JuliaLang/Iterators.jl) also provides a useful function `subsets`, and you can write `[4,6] in subsets([4,5,6])`. – Gnimuc Apr 01 '16 at 01:37
  • That doesn't give the correct result, and even if it did, it doesn't scale at all (I benchmarked all of these with different lengths of vectors with Int64s) – Scott Jones Apr 02 '16 at 00:06
  • I misunderstood the question, for those who would like to check whether each element of array `A`(not consider `A` as a whole sequence) is included in another array `B`, `setdiff(A, B) |> isempty` is sufficient to do the job. – Gnimuc Jan 28 '18 at 10:54

7 Answers7

12

I think it is worth mentioning that in Julia 1.0 you have the function issubset

> issubset([4,6], [4,6,5])
true

You can also quite conveniently call it using the \subseteq latex symbol

> [4,6] ⊆ [4,6,5]
true

This looks pretty optimized to me:

> using Random

> x, y = randperm(10^3)[1:10^2], randperm(10^3);

> @btime issubset(x, y);
16.153 μs (12 allocations: 45.96 KiB)
rodrigolece
  • 1,039
  • 2
  • 13
  • 19
7

It takes a little bit of code to make a function that performs well, but this is much faster than the issubvec version above:

function subset2(x,y)
    lenx = length(x)
    first = x[1]
    if lenx == 1
        return findnext(y, first, 1) != 0
    end
    leny = length(y)
    lim = length(y) - length(x) + 1
    cur = 1
    while (cur = findnext(y, first, cur)) != 0
        cur > lim && break
        beg = cur
        @inbounds for i = 2:lenx
            y[beg += 1] != x[i] && (beg = 0 ; break)
        end
        beg != 0 && return true
        cur += 1
    end
    false
end

Note: it would also be much more useful if the function actually returned the position of the beginning of the subarray if found, or 0 if not, similarly to the findfirst/findnext functions.

Timing information (the second one is using my subset2 function):

  0.005273 seconds (65.70 k allocations: 4.073 MB)
  0.000086 seconds (4 allocations: 160 bytes)
Scott Jones
  • 1,750
  • 14
  • 14
  • The first `@time` result (for `issubvec`) looks like it might include compilation - it is too much of an outlier for such a simple call. Can you recheck (with a compile run before timing)? – Dan Getz Apr 02 '16 at 10:30
  • Not an outlier - I of course compiled before running (without the @time macro). I also tested various lengths, that one I showed was testing with vector of length 64K, searching for a sequence of 4 (the last 4 values in the vector). `issubvec` seems to have O(n) allocations, where n is the length of y. – Scott Jones Apr 02 '16 at 10:36
  • OK. The test case is important. If you add the test run code exactly, I can see if using Julia 0.5 vs. 0.4 can be important in this case. – Dan Getz Apr 02 '16 at 10:38
  • Maybe better to publish as a gist, and put the link here? – Scott Jones Apr 02 '16 at 10:40
  • Let's just leave it. Really, `subset2` is more optimized (and if one wants to push, there are some more optimizations), but it might be for another discussion. – Dan Getz Apr 02 '16 at 10:46
  • Yes, at least it's a starting point for people who need something fast, and I think your `issubvec` is demonstrative of what can be done with the array functions in the standard library. – Scott Jones Apr 02 '16 at 10:50
  • Does someone know whether this approach is broken with current julia versions i.e. what could be the reason. julia 1.5.3 yields: `ERROR: LoadError: MethodError: no method matching findnext(::Array{String,1}, ::String, ::Int64) Closest candidates are: findnext(::Regex, ::Union{String, SubString}, ::Integer) at regex.jl:299 findnext(::Regex, ::AbstractString, ::Integer) at regex.jl:324 findnext(::Base.RegexAndMatchData, ::Any, ::Any) at regex.jl:460 ` – Scipham May 26 '21 at 15:30
  • This doesn't work on julia 1.7. Tried to write upgraded code here, but stackoverflow won't format correctly multiline code in comments. Pb is that `findnext` has changed. One could write, for instance, the beginning of the `while` loop like this: `while !isnothing(begin cur = findnext(==(first), y, cur) end)`. – Rodrigo Vargas May 23 '22 at 08:29
6

For the third condition i.e. vector [4,6] appears as a sub-vector of 4,6,5 the following function is suggested:

issubvec(v,big) = 
  any([v == slice(big,i:(i+length(v)-1)) for i=1:(length(big)-length(v)+1)])

For the second condition, that is, give a boolean for each element in els vectors which appears in set vector, the following is suggested:

function vecin(els,set)
  res = zeros(Bool,size(els))
  res[findin(els,set)]=true
  res
end

With the vector in the OP, these result in:

julia> vecin([4,6],[4,6,5])
2-element Array{Bool,1}:
 true
 true

julia> issubvec([4,6],[4,6,5])
true
Dan Getz
  • 17,002
  • 2
  • 23
  • 41
  • 1
    issubvec does return the correct result, but is also not very performant, because it is making many allocations. It's a good idea to use `@time` to see performance suffers due to excessive allocations. – Scott Jones Apr 02 '16 at 00:12
  • 1
    `issubvec` is certainly unoptimized, @ScottJones, but its logic is very clear - which was my intention. The function you wrote is better (and even more optimized algorithms for searching substrings/subvectors exist). I think such a generic subvectors functions might fit in Base (with similar names to string functions). – Dan Getz Apr 02 '16 at 08:38
  • Actually, I had to struggle with the logic of `issubvec`, with the combination of array comprehension and using the `slice` and `any` functions. That's not meant as a criticism, I love seeing the powerful things that can be done with Julia's array functions, but coming from C/C++/Java etc. I had to twist my brain to comprehend it. Also, I've seen that short sweet code like that often doesn't scale, and I'm a performance guy – Scott Jones Apr 02 '16 at 10:15
3

note that you can now vectorize in with a dot:

julia> in([4,6,5]).([4, 6])
2-element BitArray{1}:
 true
 true

and chain with all to get your answer:

julia> all(in([4,6,5]).([4, 6]))
true
Adrien
  • 31
  • 1
1

I used this recently to find subsequences in arrays of integers. It's not as good or as fast as @scott's subset2(x,y)... but it returns the indices.

function findsequence(arr::Array{Int64}, seq::Array{Int64})
    indices = Int64[]
    i = 1
    n = length(seq)
    if n == 1
        while true
            occurrence = findnext(arr, seq[1], i)
            if occurrence == 0
                break
            else
                push!(indices, occurrence)
                i = occurrence +1
            end
        end
    else
        while true
            occurrence = Base._searchindex(arr, seq, i)
            if occurrence == 0
                break
            else
                push!(indices, occurrence)
                i = occurrence +1
            end
        end
    end
    return indices
end

julia> @time findsequence(rand(1:9, 1000), [2,3])
    0.000036 seconds (29 allocations: 8.766 KB)
    16-element Array{Int64,1}:
   80
  118
  138
  158
  234
  243
  409
  470
  539
  589
  619
  629
  645
  666
  762
  856
daycaster
  • 2,655
  • 2
  • 15
  • 18
  • 1
    Yes, that's very useful. I also wasn't aware of Base._searchindex, I'll have to benchmark it! I think an iterator would be good as well, so as not to create a potentially large vector (could be up to the length of seq). – Scott Jones Apr 02 '16 at 10:40
0

Here is a more up-to-date implementation using findall

function issubsequence(A, B)
    B1inA = findall(isequal(B[1]), A) # indices of the first element of b occuring in a
    matchFirstIndex = [] # Saves the first index in A of the occurances
    for i in B1inA
        if length(A[i:end]) < length(B) continue end
        if A[i:i + length(B) - 1] == B
            push!(matchFirstIndex, i)
        end
    end
    return matchFirstIndex
end

I get a similar runtime to @daycaster

@time issubsequence(rand(1:9, 1000), [2,3])
  0.000038 seconds (111 allocations: 20.844 KiB)
7-element Vector{Any}:
  57
 427
 616
 644
 771
 855
 982
b-fg
  • 3,959
  • 2
  • 28
  • 44
0

There is no standard Julia library function to determine if a particular sequence occurs as a subsequence of another. Probably this is because this is actually a fairly trickly problem (known as the String-searching algorithm) and quite how to do it depends on whether you'll be searching repeatedly, whether you want to do multiple matches, have multiple patterns, want fuzzy matches, etc.

The other answers here give reasonable answers but some are old and Julia has improved, and I wanted to offer a slightly more idiomatic solution.

function issubarray(needle, haystack)
    getView(vec, i, len) = view(vec, i:i+len-1)
    ithview(i) = getView(haystack, i, length(needle))
    return any(i -> ithview(i) == needle, 1:length(haystack)-length(needle)+1)
end

This is lightening fast and requires almost no memory - Julia's view is lightweight and efficient. And, as always with Julia, the solution is generally to simply define more functions.