0

I'm working on putting together an algorithm for searching through an array to find the index of a given value. I'm trying to follow a similar technique found in the .NET Core CLR.

Where I am confused is the value for the idx that is returned by the call to BitOperations.TrailingZeroCount. When I search for 2 it should return an index value of 1 for the data that I am searching. What I get is a 4. Do I need to do some conversion from the byte offset back to whatever the actual type is? Is this the most efficient way to write this? There are not many docs on using intrinsics with .NET.

open System
open System.Runtime.Intrinsics.X86
open System.Runtime.Intrinsics
open System.Numerics

#nowarn "9"
#nowarn "20"

[<EntryPoint>]
let main argv =

    let searchSpace : Span<int32> = Span [| 1 .. 8 |]
    let pSearchSpace = && (searchSpace.GetPinnableReference ())
    let search = Sse2.LoadVector128 (pSearchSpace)

    let value = 2
    let values = Vector128.Create value

    let comparison = Sse2.CompareEqual (values, search)
    let matches = Sse2.MoveMask (comparison.AsByte())
    
    if matches > 0 then
        let idx = BitOperations.TrailingZeroCount matches
        printfn "%A" idx // <--- This prints 4 instead of 1

    0 // return an integer exit code
Matthew Crews
  • 4,105
  • 7
  • 33
  • 57
  • 1
    BTW, you'd probably want `if matches != 0` if you were using a 256-bit vector. `vpmovmskb` will give you a 32-bit bitmask, so the sign bit could be set. Also, if you're linear searching an array larger than 1 vector, and expect the first match to be not right away, you might want to OR together some compare results, like how glibc memchr does (https://code.woboq.org/userspace/glibc/sysdeps/x86_64/multiarch/memchr-avx2.S.html), and sort out where the match was when you know there is one in a chunk of 2 or 4 vectors (1 or 2 cache lines). – Peter Cordes Aug 22 '21 at 14:38
  • That is exactly what I am expecting. I’ve been digging into SIMD for fast searching of an array of int to find the index for a specific value. In my case the array is sorted and the values are unique. The size of the array is likely in the range of 1K - 10K elements. – Matthew Crews Aug 22 '21 at 15:25
  • The array is *sorted*? So you're using this after a binary search gets you close? Or you're using an implicit quint-tree or something with SIMD to decide which of 5 subtrees to descend next... [What is the most efficient way to implement a BST in such a way the find(value) function is optimized for random values in the tree on x86?](https://stackoverflow.com/a/56614414) SIMD linear search over 1k elements is probably not as good as even a plain binary search. (Make it branchless if you expect the data to be hot in cache.) – Peter Cordes Aug 22 '21 at 15:29
  • The full description of the problem is that I have two arrays, A and B. Both are sorted arrays of unique int. For each element in A, I need to find the index of the match in B. Since they are sorted, I wanted to take advantage that I only have to search the remaining values in B for each remaining element in A. I've been looking at SIMD for performing many comparisons at once. – Matthew Crews Aug 22 '21 at 16:32
  • I see. Maybe brute force `pcmpgtd` to search until you find one that's greater, then `pcmpeqd` to see if there was a match in that last vector. Then you're not going all the way to the end, way past the place where the element would be if present. – Peter Cordes Aug 22 '21 at 16:51
  • The whole problem might make a good separate question. – Peter Cordes Aug 22 '21 at 16:54
  • Asked the follow up question here: https://stackoverflow.com/questions/68884225/simd-instructions-to-accelerate-search-of-int-array – Matthew Crews Aug 22 '21 at 18:47

1 Answers1

2

Sse2.MoveMask (comparison.AsByte()) (i.e. vpmovmskb) gives you 4 mask bits per int32 match result, one from each Byte like you asked for.

Either use vmovmskps (perhaps MoveMask(cmp.AsFloat()), if that's how F# does it?) or deal with the byte instead of element indices you get from bsf / tzcnt.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847