Fastest way to compare two byte arrays in F#

Question

I am writing an application that uses the Kinect photo data and compares two frames against each other in F#. I am using Rob Miles Learn the Kinect Api page 74 as a guidepost but I am not using pointers and the performance is suffering. The byte from the Kinect frame is 1,228,800 bytes. I wrote the comparison like this:

member this.differenceCount(currentImageBytes) =
    if  previousImageBytes |> Seq.length = 0 then
        previousImageBytes <- currentImageBytes
        0
    else
        let bytes = Seq.zip previousImageBytes currentImageBytes
        let differenceCount = bytes |> Seq.mapi(fun i e -> i, e)
                                    |> Seq.filter(fun (i,e) -> i % 4 <> 0 )
                                    |> Seq.map snd
                                    |> Seq.filter(fun (p,c) -> p <> c)
                                    |> Seq.length
        previousImageBytes <- currentImageBytes
        differenceCount

When I run it, the screen lags because (I think) it is taking too long to process the arrays. In addition, the error rate is approaching 50%.

1) Am I approaching the problem wrong? 2) Is there a way to optimize my code to speed things up?

hi - this problem is not F# specific (see https://stackoverflow.com/questions/43289/comparing-two-byte-arrays-in-net) - indeed you can *translate* all of the answers ther e - but if you really need speed (and are on Windwos) than grap you the PInvoke answer from there (you can put it into a C# dll if you want) and use it - you'll never get something as fast inside .net — Random Dev, Oct 28 '14 at 11:56
I think that is the right track. I have the PInvoke working with memcmp but it is returning 0, 1 or -1. I need the total number of differences. Is there a native method that does that? — Jamie Dixon, Oct 28 '14 at 12:19
oh - I guess I have missed this - I don't know a native version for this no - it's a wild guess but I would bet that the best way to do something like this is to move it to GPU shaders (or write a low-level C/C++/Asm implementation) — Random Dev, Oct 28 '14 at 12:23
:-( I am not sure what a GPU shader is. I'll see if someone has written a C/C++ implementation that I can use like this: http://stackoverflow.com/questions/15313646/fast-counting-the-number-of-equal-bytes-between-two-arrays — Jamie Dixon, Oct 28 '14 at 12:25
A shader is a simple programm that runs on one of the shader units in a GPU - and as a GPU has very many of those you can get massive parallel processing power, and of course this is used to grank out your newest game - but I meant it as a kind of half-joke - right now I would look for a native implementation or go with what John proposed — Random Dev, Oct 28 '14 at 12:33
John's answer is too slow so I am looking for a native implementation. I can also use the unsafe C# code in the book (which I know works) also — Jamie Dixon, Oct 28 '14 at 12:54
You could investigate writing an SSE2 algorithm in C++/CLI and then consuming that from F# — Steve, Oct 28 '14 at 17:32
Using a loop will avoid all of the extra allocations. The GC pressure by this will likely be a huge culprit... — Reed Copsey, Oct 28 '14 at 18:38
For further information re. GPU processing in F# (as mentioned by @Carsten), see the [F# Foundation website](http://fsharp.org/use/gpu/): "GPU execution is a technique for high-performance financial, image processing and other data-parallel numerical programming." — Marc Sigrist, Oct 28 '14 at 19:16

score 10 · Accepted Answer · answered Oct 28 '14 at 18:44

Your sequence mapping/filtering via tuples causes a lot of boxing overhead. The following example avoids the boxing and works in parallel, which (on my machine) is 50 times faster.

let parallelRanges =
    let kinectSize = 1228800
    let subSize = kinectSize / Environment.ProcessorCount
    let iMax = kinectSize - 1
    let steps = [| -1 .. subSize .. iMax |]
    steps.[steps.Length - 1] <- iMax
    steps |> Seq.pairwise |> Seq.toArray |> Array.map (fun (x, y) -> x + 1, y)

let countDiffs (prevBytes:byte[]) (curBytes:_[]) =
    let count (fromI, toI) =
        let rec aux i acc =
            if i > toI then acc
            elif i % 4 <> 0 && prevBytes.[i] <> curBytes.[i] then 
                aux (i + 1) (acc + 1)
            else aux (i + 1) acc
        aux fromI 0

    parallelRanges
    |> Array.Parallel.map count
    |> Array.sum

Curious - how much faster is this without running in parallel? I'd expect it to still be quicker than the previous one. — Reed Copsey, Oct 28 '14 at 18:52
@Reed: When using only one core, it's still ca. 20x faster than the original. Using four cores, it's about 50x faster. — Marc Sigrist, Oct 28 '14 at 19:00

John Palmer · Answer 2 · 2014-10-28T10:30:25.580

3

Without some test data to profile and compare I would do something like

let len = Array.length previousImageBytes
let mutable count = 0
for i in 0 .. 4 .. (len-1) do
    if previousImageBytes.[i] <> currentImageBytes.[i] then
        count <- count+1

which has one pass over the data rather than 5 and avoids the seq functions which are slow.

edited Oct 28 '14 at 10:30

answered Oct 28 '14 at 10:18

John Palmer

25,356
3
48
67

One small change: for i in 0 .. 4 .. len - 1 do – Jamie Dixon Oct 28 '14 at 10:28
But still not fast enough. I might have to use another thread and just sample the frames. – Jamie Dixon Oct 28 '14 at 10:29
This is very hard without some data to benchmark against. and fixed the first bug – John Palmer Oct 28 '14 at 10:30
FYI - The original only had 2 passes over the data (`Seq.length` in the `if` and the maps/filters, which happen in one pass) – Reed Copsey Oct 28 '14 at 18:38

score 2 · Answer 3 · answered Nov 02 '14 at 12:32

I have experimented with several different implementations of byte array compare in F# for a project I've been working on.

Broadly speaking, for this particular problem, I found that "more idiomatic F#" == "slower". So I ended up with this:

// this code is very non-F#-ish.  but it's much faster than the
// idiomatic version which preceded it.

let Compare (x:byte[]) (y:byte[]) =
    let xlen = x.Length
    let ylen = y.Length
    let len = if xlen<ylen then xlen else ylen
    let mutable i = 0
    let mutable result = 0
    while i<len do
        let c = (int (x.[i])) - int (y.[i])
        if c <> 0 then
            i <- len+1 // breaks out of the loop, and signals that result is valid
            result <- c
        else
            i <- i + 1
    if i>len then result else (xlen - ylen)

I've violated several guidelines of good functional programming. I've used if-then-else instead of match. I've got mutables. And I faked a break statement with a cheesy change to a loop index variable.

Nonetheless, this ended up being dramatically faster than anything more idiomatic. Even something as simple as a closure with tail recursion was slower. (The profiler indicates that a closure involves a memory allocation on the heap.)

The project containing this code is on GitHub:

https://github.com/ericsink/LSM

The commit history of fs/lsm.fs should contain a record of my other failed attempts.

Why don't you use recursion, breaking out of the loop by just returning? — kaefer, Nov 02 '14 at 19:20
Note that in this example, we want to find the number of elements that are not equal, not just the index of the first one — John Palmer, Nov 02 '14 at 22:27

Fastest way to compare two byte arrays in F#

3 Answers3