4

I have converted an Ocaml program into F#, and overall performance is the same as Ocaml.

However, in order to get to this point, I had try replace exceptions by Option values.

The program works a lot with list, maps etc that has int*int*int (=tuples of three ints).

I have a performance leak I do not understand. Can anyone explain how I can have 90% of my total execution time inside a function called

System.Tuple`3.System.Collections.IStructuralEquatable.Equals(
   object, class System.Collections.IEqualityComparer)

and what I can do about it?

chollida
  • 7,834
  • 11
  • 55
  • 85
mattias
  • 870
  • 5
  • 18
  • 7
    Facetious answer - your code is `while true do (1,1,1)=(2,2,2) |> ignore`. Without code this is hard to answer. – John Palmer May 15 '13 at 05:52
  • 2
    Or even `while (1,1,1) = (1,1,1) do ()` :) – Ramon Snir May 15 '13 at 07:47
  • 2
    Something maybe irrelevant: .Net exception handling is slower than Ocaml when errors are caught (which means it is not good idea to control flow using exceptions in F#) http://stackoverflow.com/questions/12160390/ocaml-performance-of-exceptions – colinfang May 15 '13 at 09:11
  • 1
    @mattias we need to see actual code. – Onorio Catenacci May 15 '13 at 15:38
  • `IStructuralEquatable` handles `(1,1,1) = (1,1,1)` for example. Knowing whether that is substantial would require more example code. – Guvante May 16 '13 at 22:52
  • Btw, do not overtrust OCaml exception performance. It can be slow when you enable stack trace. – camlspotter May 17 '13 at 11:01
  • This reminds me of how some of our tools, in OCaml, used to spend 15% or so of their time in the polymorphic compare function... – David Monniaux Jul 16 '13 at 00:59
  • You almost certainly want to replace uses of `int * int * int` with your own struct type. – J D Mar 17 '14 at 12:32

1 Answers1

4

Well as people have noted it is pretty hard to diagnose a problem with no code, but I'll do my best :-)

Your question made me run a test that I had been planning on running for a while, which was to test the performance of reference types vs values types as keys to associative containers. My hypothesis, based on vague feelins about the .net runtime was that for smallish key sizes (your 3 ints) that the value type would win out. I appear to have been wrong ([edit] actually further testing proved it to be correct! [/edit])

Let's see some code (as per usual, micro performance test, so take with grain of salt):

I used 5 different containers of various flavours to store the ints (F# is nice enough to create equality and comparers for struct types as well as records):

type KeyStruct(_1':int, _2':int, _3':int) = struct
    member this._1 = _1'
    member this._2 = _2'
    member this._3 = _3'
end

type KeyGenericStruct<'a>(_1':'a, _2':'a, _3':'a) = struct
    member this._1 = _1'
    member this._2 = _2'
    member this._3 = _3'
end

type KeyRecord = { _1 : int; _2 : int; _3 : int }

type KeyGenericRecord<'a> = { _1 : 'a; _2 : 'a; _3 : 'a }

plus I used the original tuple (int*int*int)

I then created the following test harness:

let inline RunTest<'a when 'a : equality> iterationCount createAssociativeMap (createKey:_->_->_->'a) =
    System.GC.Collect ()
    System.GC.WaitForFullGCComplete () |> ignore

    let data = [|
        for a in 0..99 do
            for b in 0..99 do
                for c in 0..99 do
                    yield a,b,c |]
    // shuffle
    let r = System.Random (0)
    for i = 0 to data.Length-1 do
        let j = r.Next (i, data.Length)
        let t = data.[i]
        data.[i] <- data.[j]
        data.[j] <- t

    let keyValues =
        data
        |> Array.mapi (fun i k -> k, 0.5-(float i)/(float data.Length))
        |> Array.toSeq

    let sw = System.Diagnostics.Stopwatch.StartNew ()
    let mapper = createAssociativeMap createKey keyValues
    let creationTime = sw.ElapsedMilliseconds

    let sw = System.Diagnostics.Stopwatch.StartNew ()
    let mutable checksum = 0.
    for i = 0 to iterationCount do
        let a, b, c = r.Next 100, r.Next 100, r.Next 100
        let key = createKey a b c
        checksum <- checksum + (mapper key)
    let accessTime= sw.ElapsedMilliseconds

    printfn "checksum %f elapsed %d/%d (%s)" checksum creationTime accessTime (typeof<'a>.Name)

let RunNTrials<'a when 'a : equality> = RunTest<'a> 1000000

and then ran it with some various types of associative containers:

let createDictionary create keyValues = 
    let d = System.Collections.Generic.Dictionary<_,_> ()
    keyValues
    |> Seq.map (fun ((_1,_2,_3),value) -> create _1 _2 _3, value)
    |> Seq.iter (fun (key,value) -> d.[key] <- value)
    (fun key -> d.[key])

let createDict create keyValues =
    let d = 
        keyValues
        |> Seq.map (fun ((_1,_2,_3),value) -> create _1 _2 _3, value)
        |> dict
    (fun key -> d.[key])

let createMap create keyValues =
    let d = 
        keyValues
        |> Seq.map (fun ((_1,_2,_3),value) -> create _1 _2 _3, value)
        |> Map.ofSeq
    (fun key -> d.[key])

let createCustomArray create keyValues =
    let maxA = 1 + (keyValues |> Seq.map (fun ((a,_,_),_) -> a) |> Seq.max)
    let maxB = 1 + (keyValues |> Seq.map (fun ((_,b,_),_) -> b) |> Seq.max)
    let maxC = 1 + (keyValues |> Seq.map (fun ((_,_,c),_) -> c) |> Seq.max)

    let createIndex a b c = a * maxB * maxC + b * maxC + c

    let values : array<float> = Array.create (maxA * maxB * maxC) 0.
    keyValues
    |> Seq.iter (fun ((a,b,c),d) -> values.[createIndex a b c] <- d)

    (fun (a,b,c) -> values.[a * maxB * maxC + b * maxC + c])

let RunDictionary<'a when 'a : equality> = RunNTrials<'a> createDictionary 
let RunDict<'a when 'a : equality> = RunNTrials<'a> createDict
let RunMap<'a when 'a : comparison> = RunNTrials<'a> createMap
let RunCustomArray = RunNTrials<_> createCustomArray

And ran the tests as follows:

printfn "Using .net's System.Collections.Generic.Dictionary"
RunDictionary (fun a b c -> { KeyRecord._1=a; _2=b; _3=c }) 
RunDictionary (fun a b c -> { KeyGenericRecord._1=a; _2=b; _3=c })
RunDictionary (fun a b c -> KeyStruct(a, b, c))
RunDictionary (fun a b c -> KeyGenericStruct(a, b, c))
RunDictionary (fun a b c -> (a, b, c))

printfn "Using f# 'dict'"
RunDict (fun a b c -> { KeyRecord._1=a; _2=b; _3=c }) 
RunDict (fun a b c -> { KeyGenericRecord._1=a; _2=b; _3=c })
RunDict (fun a b c -> KeyStruct(a, b, c))
RunDict (fun a b c -> KeyGenericStruct(a, b, c))
RunDict (fun a b c -> (a, b, c))

printfn "Using f# 'Map'"
RunMap (fun a b c -> { KeyRecord._1=a; _2=b; _3=c }) 
RunMap (fun a b c -> { KeyGenericRecord._1=a; _2=b; _3=c })
RunMap (fun a b c -> KeyStruct(a, b, c))
RunMap (fun a b c -> KeyGenericStruct(a, b, c))
RunMap (fun a b c -> (a, b, c))

printfn "Using custom array"
RunCustomArray (fun a b c -> (a, b, c))

And got the following results (the checksum is just to try to ensure I'm not doing anything too stupid) the "elapsed n/m" is "elapsed {container creations time}/{container access time}":

Using .net's System.Collections.Generic.Dictionary
checksum -55.339450 elapsed 874/562 (KeyRecord)
checksum -55.339450 elapsed 1251/898 (KeyGenericRecord`1)
checksum -55.339450 elapsed 569/1024 (KeyStruct)
checksum -55.339450 elapsed 740/1427 (KeyGenericStruct`1)
checksum -55.339450 elapsed 2497/2218 (Tuple`3)
Using f# 'dict'
checksum -55.339450 elapsed 979/628 (KeyRecord)
checksum -55.339450 elapsed 1614/1206 (KeyGenericRecord`1)
checksum -55.339450 elapsed 3237/5625 (KeyStruct)
checksum -55.339450 elapsed 3290/5626 (KeyGenericStruct`1)
checksum -55.339450 elapsed 2448/1914 (Tuple`3)
Using f# 'Map'
checksum -55.339450 elapsed 8453/2638 (KeyRecord)
checksum -55.339450 elapsed 31301/25441 (KeyGenericRecord`1)
checksum -55.339450 elapsed 30956/26931 (KeyStruct)
checksum -55.339450 elapsed 53699/49274 (KeyGenericStruct`1)
checksum -55.339450 elapsed 32203/25274 (Tuple`3)
Using custom array
checksum -55.339450 elapsed 484/160 (Tuple`3)

Multiple runs showed a small variation in numbers, but the major trends did hold. So mainly we're testing if boxing occurs (in map and dict it does) then the GetHashCode () implementation (generic version if slower than fully typed version; and Tuple worst of all) and in the case of Map the CompareTo.

Now where does this leave your question? Well potentially if all the time is being spent in the Equals of the Tuple, then changing to a Record type might help!

But probably not :-) [because really if it was an hash container then GetHashCode shouldn't be causing many clashes and it is was a map then it would be CompareTo)

Anyway, in real code you would obviously have different garbage collector affects, etc, etc. as I said take it for what it is...


I did some further testing, by wrapping starting the each test multiple times in Tasks and each individually in parallel over and over (starting more tasks than I have cores), and then taking the average time to complete.

The reason for doing this was to take the garbage collector time into account.

When I did this, the original hypothesis of the non generic struct key did win.

If anyone is really interested and unwilling to make the change I can post the code, but really I think I'm the only one reading this, so it's more just my personal notes :-)

Paul Westcott
  • 901
  • 1
  • 10
  • 19