FSharp runs my algorithm slower than Python

Question

Years ago, I solved a problem via dynamic programming:

https://www.thanassis.space/fillupDVD.html

The solution was coded in Python.

As part of expanding my horizons, I recently started learning OCaml/F#. What better way to test the waters, than by doing a direct port of the imperative code I wrote in Python to F# - and start from there, moving in steps towards a functional programming solution.

The results of this first, direct port... are disconcerting:

Under Python:

  bash$ time python fitToSize.py
  ....
  real    0m1.482s
  user    0m1.413s
  sys     0m0.067s

Under FSharp:

  bash$ time mono ./fitToSize.exe
  ....
  real    0m2.235s
  user    0m2.427s
  sys     0m0.063s

(in case you noticed the "mono" above: I tested under Windows as well, with Visual Studio - same speed).

I am... puzzled, to say the least. Python runs code faster than F# ? A compiled binary, using the .NET runtime, runs SLOWER than Python's interpreted code?!?!

I know about startup costs of VMs (mono in this case) and how JITs improve things for languages like Python, but still... I expected a speedup, not a slowdown!

Have I done something wrong, perhaps?

I have uploaded the code here:

https://www.thanassis.space/fsharp.slower.than.python.tar.gz

Note that the F# code is more or less a direct, line-by-line translation of the Python code.

P.S. There are of course other gains, e.g. the static type safety offered by F# - but if the resulting speed of an imperative algorithm is worse under F# ... I am disappointed, to say the least.

EDIT: Direct access, as requested in the comments:

the Python code: https://gist.github.com/950697

the FSharp code: https://gist.github.com/950699

Please use something like http://gist.github.com/ to upload your code... it really sucks having to download a tar.gz file to see your code — razenha, May 01 '11 at 17:56
It's myths, all myths. It's not compiled that is faster, or interpreted that is faster, or native that is faster, or jitted that is faster. Only faster is faster. Live by that. — R. Martinho Fernandes, May 01 '11 at 17:59
@razenha: The tarball includes a Makefile, with the compilation flags I used. Maybe I missed something there and somebody will point it out - hence the tarball. — ttsiodras, May 01 '11 at 18:04
"Faster is faster" - are you disagreeing with the statement "F# code is more or less a direct, line-by-line translation of the Python code"? Or is it "less"? — duffymo, May 01 '11 at 18:07
@duffy: I does seem like a direct line-by-line translation. But is that code amenable to be fast in F#? Maybe not. "the resulting speed of an imperative algorithm is worse under F#". F# is not an imperative language. Why should one expect it to be better at that than imperative languages? — R. Martinho Fernandes, May 01 '11 at 18:16
I don't have Python to test it, but the F# version completes in ~1.5 sec on my Intel Core 2 Duo CPU (2.26 GHz) (on Windows, using `fsi.exe` and the `#time` timing). However, I dind't try to understand your code - I think you'll more likely get a useful answer if you post some simple F# code that you're trying to optimize (because not everybody will want to analyse your two samples). — Tomas Petricek, May 01 '11 at 18:16
@Tomas: Thanks for your benchmark, but it only makes sense if you install ActivePython or Cygwin's python on your machine and run the python code as well. As for analyzing, you don't have to - the code is using mutable dictionaries, reading and writing in them, and both the Python and the F# versions emit the same output... I was hoping there are some optimization options I missed in the Makefile - do you see anything missing there? — ttsiodras, May 01 '11 at 18:22
Also, translating code line-by-line from Python is a good way to start exploring the F# syntax, but it doesn't really show you any of the benefits of F#. I believe you could have more fun if you tried to solve the problem using more idiomatic functional style (it probably won't be faster, but it will quite likely be more readable and shorter). — Tomas Petricek, May 01 '11 at 18:22
@Tomas: I have your book next to my keyboard :-) As you said, this was supposed to be a first step, wetting my appetite - but not seeing ANY speedup and instead seeing less speed than an interpreted language... was, well, unexpected. — ttsiodras, May 01 '11 at 18:26
On my machine, Python runs in 1.2 secs and the F# version in 1.8 secs. What this benchmark probably shows, is that Python has an excellent Dictionary implementation, maybe with optimizations for pairs as keys. — wmeyer, May 01 '11 at 18:39
@ttsiodras: Yes, it is definitely an unexpected result. Python obviously has some good optimizations in place. — Tomas Petricek, May 01 '11 at 19:12
@Tomas: I hope... that a ticket or something can be raised in the MS F# team about this. I don't know why Dictionary performs so bad, but it appears that some optimization is in order... — ttsiodras, May 01 '11 at 19:20
I've written the same program, using a similar algorithm as yours for the “heap” part (with additional input of ordered collections (cough—think tv series episodes—cough) using a combinatorial algorithm and filling the remaining space by picking best-fit from the “heap”). You don't have to worry for many millions of combinations, though, since you'll have at the most 2293760 combinations because of the 2k DVD sector size. Hm... you already knew that, but choose to work with MB. Perhaps our algorithms aren't as similar as I thought :) — tzot, May 01 '11 at 23:43
@Martinho: "But is that code amenable to be fast in F#?" Yes. My optimized F# is 100× faster... — J D, May 02 '11 at 10:23
@ΤΖΩΤΖΙΟΥ: No, picking best-fit doesn't always work. I can explain over e-mail, if you want (ttsiodras at gmail) — ttsiodras, May 05 '11 at 07:17
@JonHarrop Yes your optimized F# version is always faster, we know that. It's even 10x faster than C++. Even the CLR team hasn't made such a claim yet. — user, Oct 01 '12 at 05:31

ttsiodras · Accepted Answer · 2011-05-04T07:12:13.487

49

Dr Jon Harrop, whom I contacted over e-mail, explained what is going on:

The problem is simply that the program has been optimized for Python. This is common when the programmer is more familiar with one language than the other, of course. You just have to learn a different set of rules that dictate how F# programs should be optimized... Several things jumped out at me such as the use of a "for i in 1..n do" loop rather than a "for i=1 to n do" loop (which is faster in general but not significant here), repeatedly doing List.mapi on a list to mimic an array index (which allocated intermediate lists unnecessarily) and your use of the F# TryGetValue for Dictionary which allocates unnecessarily (the .NET TryGetValue that accepts a ref is faster in general but not so much here)

... but the real killer problem turned out to be your use of a hash table to implement a dense 2D matrix. Using a hash table is ideal in Python because its hash table implementation has been extremely well optimized (as evidenced by the fact that your Python code is running as fast as F# compiled to native code!) but arrays are a much better way to represent dense matrices, particularly when you want a default value of zero.

The funny part is that when I first coded this algorithm, I DID use a table -- I changed the implementation to a dictionary for reasons of clarity (avoiding the array boundary checks made the code simpler - and much easier to reason about).

Jon transformed my code (back :-)) into its array version, and it runs at 100x speed.

Moral of the story:

F# Dictionary needs work... when using tuples as keys, compiled F# is slower than interpreted Python's hash tables!
Obvious, but no harm in repeating: Cleaner code sometimes means... much slower code.

Thank you, Jon -- much appreciated.

EDIT: the fact that replacing Dictionary with Array makes F# finally run at the speeds a compiled language is expected to run, doesn't negate the need for a fix in Dictionary's speed (I hope F# people from MS are reading this). Other algorithms depend on dictionaries/hashes, and can't be easily switched to using arrays; making programs suffer "interpreter-speeds" whenever one uses a Dictionary, is arguably, a bug. If, as some have said in the comments, the problem is not with F# but with .NET Dictionary, then I'd argue that this... is a bug in .NET!

EDIT2: The clearest solution, that doesn't require the algorithm to switch to arrays (some algorithms simply won't be amenable to that) is to change this:

let optimalResults = new Dictionary<_,_>()

into this:

let optimalResults = new Dictionary<_,_>(HashIdentity.Structural)

This change makes the F# code run 2.7x times faster, thus finally beating Python (1.6x faster). The weird thing is that tuples by default use structural comparison, so in principle, the comparisons done by the Dictionary on the keys are the same (with or without Structural). Dr Harrop theorizes that the speed difference may be attributed to virtual dispatch: "AFAIK, .NET does little to optimize virtual dispatch away and the cost of virtual dispatch is extremely high on modern hardware because it is a "computed goto" that jumps the program counter to an unpredictable location and, consequently, undermines branch prediction logic and will almost certainly cause the entire CPU pipeline to be flushed and reloaded".

In plain words, and as suggested by Don Syme (look at the bottom 3 answers), "be explicit about the use of structural hashing when using reference-typed keys in conjunction with the .NET collections". (Dr. Harrop in the comments below also says that we should always use Structural comparisons when using .NET collections).

Dear F# team in MS, if there is a way to automatically fix this, please do.

edited May 04 '11 at 07:12

answered May 01 '11 at 18:53

ttsiodras

10,602
6
55
71

12

Note: 1. F# dictionaries are just .NET dictionaries. 2. Python dictionaries are not implemented in Python (probably in C). – wmeyer May 01 '11 at 18:58
Very interesting message of Jon. However, changing hashtables to arrays only works for the particular example (it's not relevant in the general case). – Laurent May 01 '11 at 19:02
Which means .NET dictionaries... are slower than interpreted Python's? Does anybody know why the .NET dictionary is so slow? – ttsiodras May 01 '11 at 19:04
1

I wonder if allocating a hashtable of appropriate size before starting would help. – Tomas Petricek May 01 '11 at 22:14
6

Apparently using `Dictionary(HashIdentity.Structural)` makes it a lot faster (probably faster than Python). Replacing tuples (which are heap allocated) with structs should also improve performance significantly. BTW, I think you should also accept this answer if you can. – J D May 02 '11 at 10:07
2

@Laurent: "However, changing hashtables to arrays only works for the particular example (it's not relevant in the general case)". Absolutely and this is a very important point. When most of your time must be spent doing operations that are efficiently implemented in an interpreted language (like FFTs in Matlab or hash tables in Python) you can expect competitive performance but *in the general case* they will be a lot slower. The original programs are a perfect example of this because most of the time is spent in hash table ops and these have been extremely heavily optimized in Python. – J D May 02 '11 at 10:11
When I say "competitive" I mean up to a few times slower and when I say "a lot slower" I mean orders of magnitude like Python in 100× slower than F# in this case. – J D May 02 '11 at 10:21
@Jon Harrop: "accept this answer" - I will, as soon as SO allows me to (2 days must pass before you can accept an answer you posted to yourself). – ttsiodras May 02 '11 at 12:05
@Jon Harrop: "Python 100x slower" - not fair. To say that, you'd have to convert the Python code back to arrays as well. But that's not the point: my post and your analysis prove conclusively that F# dictionaries are so slow that even interpreted languages can compete with them - and win! So the F#/.net people at MS should fix this, IMHO. – ttsiodras May 02 '11 at 12:26
1

@ttsiodras - .NET dictionaries are "so slow" _for this workload_. If there's a performance optimization that the .NET BCL team can make to speed up performance on this case without slowing them down in other cases, they should obviously do it. However, I'm not sure that's the case, so if tuples are rarely used as dictionary keys in .NET then I'm not sure it's worth changing. Furthermore, there may be simple workarounds (like Jon Harrop's `Dictionary(HashIdentity.Structural)` suggestion above). – kvb May 02 '11 at 13:32
12

@ttsiodras: I don't follow your logic. The *only* reason your Python beat your F# is that you forgot to provide the `HashIdentity.Structural` equality comparer that you should *always* provide in F#. Just making that one minor change makes the F# faster than your Python. If you then use structs instead of tuples and use the .NET `TryGetValue` instead of the F# extension method and presize the hash table, the F# becomes 7× faster than before which is several times faster than your Python. So you cannot conclude that `Dictionary` is inefficient. – J D May 02 '11 at 14:16
7

@kvb, @Jon: I googled a lot and found this: http://cs.hubfs.net/forums/thread/654.aspx (navigate to the bottom). Don Syme clearly admits that for tuples, F# *should* use Structural comparisons BY DEFAULT, just like Python does. He says that "We'll add it to our list" back in 2006, but 5 years later, apparently this hasn't made it in... And, funny thing, "This can be very confusing for newcomers from other languages and also can lead to subtle bugs in a larger code base.". Yep, indeed :-) – ttsiodras May 02 '11 at 14:43
3

@ttsiodras - actually, the tuple type has changed since then, so that the default equality and hashing behavior work as expected. That is, the example in that thread of getting different results from multiple calls to `(1, 2, 3).GetHashCode()` no longer occurs. It's just that the performance characteristics of the built-in equality and hashing operations aren't as fast as using `HashIdentity.Structural`. – kvb May 02 '11 at 15:12
@kvb: I am new to F#, and I understand structural comparison to determine equality by comparing the two elements of the tuple - but how exactly is the default behaviour different? Does it compare references (i.e. just "pointers")? If so, then it should have failed, since (1,2) and another (1,2) can in principle be allocated in different places on the heap... – ttsiodras May 02 '11 at 15:25
1

@Tomas Petricek: In your book, listing 10.3 uses a Dictionary without using HashIdentity.Structural (which Jon claims must always be used). Is he right in saying that - should the listing be updated? – ttsiodras May 02 '11 at 15:27
1

@ttsiodras - `System.Tuple<_,_>`'s `GetHashCode` and `Equals` implementations are defined in terms of the hash codes (and equality) of its constituent parts. For `System.Tuple`, this results in structural hashing/equality, so the behavior of the dictionary will be correct (although slow). However, for something like `System.Tuple` this isn't the case (since arrays use reference equality by default). Thus, `([|0|],[|0|]).GetHashCode()` will return different values on each call, while F#'s structural hashing (`hash ([|0|], [|0|]`) will return the same value each time. – kvb May 02 '11 at 16:07
@kvb: *"so the behavior of the dictionary will be correct (although slow)"* - if structural hashing is used by default on tuple, then why does passing "HashIdentity.Structural" to the Dictionary's constructor improve speed? The code is using keys of tuple, so I don't understand what tuple has to do with this... – ttsiodras May 02 '11 at 16:58
1

@ttsiodras - The code path used for `Tuple<_,_>.GetHashCode` is different from the code path used for F#'s structural hashing, even though in the case of `Tuple` the ultimate behavior is the same. F#'s structural hashing is faster in this case. I was just bringing up `Tuple` as an example of a tuple type which does not demonstrate structural equality when using the .NET `GetHashCode` method. – kvb May 02 '11 at 17:26
1

@ttsiodras: Thanks for the reference to listing 10.3. I'll definitely consider adding a note about that in some future version of the book. If I understand it correctly, the behavior is correct in both cases, but it is less efficient. – Tomas Petricek May 02 '11 at 22:17
@ttsiodras: I have advised Microsoft to include a `Dictionary` function to construct a .NET dictionary using `HashIdentity.Structural`. F# does provide `dict` but it returns a read-only collection, which I think is insane and suspect was added by their resident Haskell fanboy... – J D May 06 '11 at 08:14
2

@Tomas: I think it is definitely a good idea to advise people to get used to always using `HashIdentity.Structural` from F# because forgetting it leads to such tricky bugs. Has bitten me on more than one occasion. – J D May 06 '11 at 08:16
Although this is an old Q-A the problem still has to be mentioned. Dr Jon Harrop stated that was not optimized for F# (which was true) but assumed that was optimized for python (which was not true). Any way still in the F# code there is no [try...catch] (https://learn.microsoft.com/en-us/dotnet/fsharp/language-reference/exception-handling/the-try-with-expression) which exists in python's code. This may make a big diff in timing. – ilias iliadis Jan 02 '19 at 21:35

kvb · Answer 2 · 2011-05-02T14:28:19.257

As Jon Harrop has pointed out, simply constructing the dictionaries using Dictionary(HashIdentity.Structural) gives a major performance improvement (a factor of 3 on my computer). This is almost certainly the minimally invasive change you need to make to get better performance than Python, and keeps your code idiomatic (as opposed to replacing tuples with structs, etc.) and parallel to the Python implementation.

score 5 · Answer 3 · edited May 23 '17 at 12:26

5

Edit: I was wrong, it's not a question of value type vs reference type. The performance problem was related to the hash function, as explained in other comments. I keep my answer here because there's an interessant discussion. My code partially fixed the performance issue, but this is not the clean and recommended solution.

--

On my computer, I made your sample run twice as fast by replacing the tuple with a struct. This means, the equivalent F# code should run faster than your Python code. I don't agree with the comments saying that .NET hashtables are slow, I believe there's no significant difference with Python or other languages implementations. Also, I don't agree with the "You can't 1-to-1 translate code expect it to be faster": F# code will generally be faster than Python for most tasks (static typing is very helpful to the compiler). In your sample, most of the time is spent doing hashtable lookups, so it's fair to imagine that both languages should be almost as fast.

I think the performance issue is related to gabage collection (but I haven't checked with a profiler). The reason why using tuples can be slower here than structures has been discussed in a SO question ( Why is the new Tuple type in .Net 4.0 a reference type (class) and not a value type (struct)) and a MSDN page (Building tuples):

If they are reference types, this means there can be lots of garbage generated if you are changing elements in a tuple in a tight loop. [...] F# tuples were reference types, but there was a feeling from the team that they could realize a performance improvement if two, and perhaps three, element tuples were value types instead. Some teams that had created internal tuples had used value instead of reference types, because their scenarios were very sensitive to creating lots of managed objects.

Of course, as Jon said in another comment, the obvious optimization in your example is to replace hashtables with arrays. Arrays are obviously much faster (integer index, no hashing, no collision handling, no reallocation, more compact), but this is very specific to your problem, and it doesn't explain the performance difference with Python (as far as I know, Python code is using hashtables, not arrays).

To reproduce my 50% speedup, here is the full code: http://pastebin.com/nbYrEi5d

In short, I replaced the tuple with this type:

type Tup = {x: int; y: int}

Also, it seems like a detail, but you should move the List.mapi (fun i x -> (i,x)) fileSizes out of the enclosing loop. I believe Python enumerate does not actually allocate a list (so it's fair to allocate the list only once in F#, or use Seq module, or use a mutable counter).

edited May 23 '17 at 12:26

Community

1
1

answered May 01 '11 at 18:45

Laurent

2,951
16
19

About your List.mapi comment: I did try my own "seq"-based enumerate: let enumerate c = seq { let idx = ref 0 for elem in c do yield (!idx, elem) idx := !idx + 1 } ... but it had no impact - speed was still slow. As I said above, turns out the culprit is the bad performance of .NET Dictionary... – ttsiodras May 01 '11 at 19:06
@ttsiodras: I don't think so. With my change, the code is a bit faster than the Python implementation, this means dictionaries are not that slow in .NET. Of course, arrays are much faster than hashtables when you know the index, but you're changing the algorithm. – Laurent May 01 '11 at 20:24
And for the `mapi`, I think it's a detail and it won't change the speed much. That said, please avoid the "ref" type when you can (`mutable` is much faster). Here, you could use `Seq.mapi` (but I think moving `List.mapi` out of the loop is a better idea). – Laurent May 01 '11 at 20:26
@Laurent: The `ref` is a significant overhead here but the `List.mapi` is not. However, the main performance difference is simply because the OP forgot to create the `Dictionary` using `HashIdentity.Structural` which you always want in F# otherwise you're in danger of getting reference equality, and it makes his program several times faster. – J D May 02 '11 at 11:50
@Laurent: Thank you, and I agree with most of what you said. About the "should be almost as fast": indeed, but I'd expect compiled F# to be at least a bit faster than a JIT (assuming cPython has one) - and definitely not slower! Of course both F# and Python end up as bytecode, but having a compiler and a static type system allows (in principle) for superior code generation compared to any JIT. – ttsiodras May 02 '11 at 12:13
@Jon: If HashIdentity.Structural should always be used, then why isn't it the default for Dictionary? I am a newbie in F#, so is there some reason to NOT use Structural? – ttsiodras May 02 '11 at 12:49
1

@ttsiodras - structural identity is an F# concept which the underlying .NET framework is completely unaware of, so the .NET dictionary type can't use it as the default. – kvb May 02 '11 at 14:02
@ttsiodras: As kvb said, because `HashIdentity.Structural` is F# specific whereas `Dictionary` is a language-agnostic .NET thing. – J D May 02 '11 at 14:21
2

Is there really still a danger of getting reference equality in the current version of F#? With tuple keys in Dictionaries that does not seem to be the case. Can it happen in other circumstances with structural F# types? – wmeyer May 02 '11 at 14:54
@kvb, @Jon: if .NET Dictionary can't be made to work with Structural comparisons by default (at least when working with tuples), then I believe the F# standard library would benefit from a version that does... which should be the "default" version taught to F# programmers. – ttsiodras May 02 '11 at 15:34
1

@wmeyer - Tuples define their equality and hashing in terms of their constituent values' equality and hashing. If their values' equality and hashing are already structural, then the tuples will have structural equality and hashing, too (such as with `int*int`). However, if their values are not structural, then the tuple's equality and hashing will not be structural either (such as with `int[]*int[]`). F#'s `=` operator and `hash` function behave structurally even on these types, as does the `HashIdentity.Structural` equality comparer. – kvb May 02 '11 at 16:16
@kvb: The code uses int*int though, not int[]*int[]. Why does HashIdentity.Structural have any impact on speed in my code? – ttsiodras May 02 '11 at 17:04
@ttsiodras, because that's an F# feature and Dictionary is agnostic and not aware of this. That's why you have to tell it to use an alternative hash identity. Also, the last para in this answer suggests it changed to struct types, but it's a record type, which still uses the heap. In recent versions of F# you can have a tuple or a record type defined as struct which may help here. In fact, later versions of F# generally create much faster code, it'd be interesting to test your original naive implementation again. – Abel Dec 23 '18 at 02:33

score 0 · Answer 4 · answered Nov 03 '20 at 22:59

Hmm.. if the hashtable is the major bottleneck, then it is properly the hash function itself. Havn't look at the specific hash function but For one of the most common hash functions namely

((a * x + b) % p) % q

The modulus operation % is painfully slow, if p and q is of the form 2^k - 1, we can do modulus with an and, add and a shift operation.

Dietzfelbingers universal hash function h_a : [2^w] -> [2^l]

lowerbound(((a * x) % 2^w)/2^(w-l))

Where is a random odd seed of w-bit.

It can be computed by (a*x) >> (w-l), which is magnitudes of speed faster than the first hash function. I had to implement a hash table with linked list as collision handling. It took 10 minutes to implement and test, we had to test it with both functions, and analyse the differens of speed. The second hash function had as I remember around 4-10 times of speed gain dependend on the size of the table. But the thing to learn here is if your programs bottleneck is hashtable lookup the hash function has to be fast too

FSharp runs my algorithm slower than Python

4 Answers4

Linked