Why is the F# version of this program 6x faster than the Haskell one?

Question

Haskell version(1.03s):

module Main where
  import qualified Data.Text as T
  import qualified Data.Text.IO as TIO
  import Control.Monad
  import Control.Applicative ((<$>))
  import Data.Vector.Unboxed (Vector,(!))
  import qualified Data.Vector.Unboxed as V

  solve :: Vector Int -> Int
  solve ar =
    V.foldl' go 0 ar' where
      ar' = V.zip ar (V.postscanr' max 0 ar)
      go sr (p,m) = sr + m - p

  main = do
    t <- fmap (read . T.unpack) TIO.getLine -- With Data.Text, the example finishes 15% faster.
    T.unlines . map (T.pack . show . solve . V.fromList . map (read . T.unpack) . T.words)
      <$> replicateM t (TIO.getLine >> TIO.getLine) >>= TIO.putStr

F# version(0.17s):

open System

let solve (ar : uint64[]) =
    let ar' = 
        let t = Array.scanBack max ar 0UL |> fun x -> Array.take (x.Length-1) x
        Array.zip ar t

    let go sr (p,m) = sr + m - p
    Array.fold go 0UL ar'

let getIntLine() =
    Console.In.ReadLine().Split [|' '|]
    |> Array.choose (fun x -> if x <> "" then uint64 x |> Some else None)    

let getInt() = getIntLine().[0]

let t = getInt()
for i=1 to int t do
    getInt() |> ignore
    let ar = getIntLine()
    printfn "%i" (solve ar)

The above two programs are the solutions for the Stock Maximize problem and times are for the first test case of the Run Code button.

For some reason the F# version is roughly 6x faster, but I am pretty sure that if I replaced the slow library functions with imperative loops that I could speed it up by at least 3 times and more likely 10x.

Could the Haskell version be similarly improved?

I am doing the above for learning purposes and in general I am finding it difficult to figure out how to write efficient Haskell code.

Just for the record, what are you measuring? The execution of the whole program (from command line) or the runtime of the body (using something not shown in the snippet)? — Tomas Petricek, May 30 '16 at 13:25
I would have imagined that for purely functional code, Haskell would be ahead of F# at least. In the current `4.0` version, F# does not even have fold, scan and zip inlined. I would imagine that for roughly similar, purely functional code Haskell should exceed F# due to the optimizations afforded to it by its purity. — Marko Grdinić, May 30 '16 at 13:26
What I am measuring is when [I go to] (https://www.hackerrank.com/challenges/stockmax), paste those programs in and press RunCodeRunCodeRun Code. Not the full submission. The time will be printed at the top. — Marko Grdinić, May 30 '16 at 13:27
I would profile the IO code and actual calculation separately. Your input code currently converts from Text to String (with T.unpack) and then uses "read" which is known to be very slow. — shang, May 30 '16 at 13:27
The whole first test case is 1.5Mb. How would one speed up that part of the program? I actually thought that converting from `Text` to `String` would slow it down, but I was not sure to do it otherwise. Using the Haskell `Text.Reader` functions? — Marko Grdinić, May 30 '16 at 13:30
Why does the Haskell version use `replicateM` to read _everything_ and start computing only after that? The F# version does not do that. Can't you move `solve` so that it is called (and its result printed) inside the `replicateM`? Right now, it seems you are dealing with large lists / Text strings which are not present in F#. — chi, May 30 '16 at 13:37
Most importantly: do we know the compiler options for F# and Haskell? What are the optimization levels? — András Kovács, May 30 '16 at 13:44
Here are the compiler [versions](https://www.hackerrank.com/environment). Not sure about the compiler options. I'll see if I can look it up. — Marko Grdinić, May 30 '16 at 13:47
@behzad.nouri The code segments are what makes them long, but I'll take note of this for next time. Personally, I kind of like adding stuff to my post until I am satisfied with the amount of information. — Marko Grdinić, May 30 '16 at 14:29
@MarkoGrdinic fyi, most people look away from [wall of text questions](http://meta.stackoverflow.com/questions/300546/detect-walls-of-text). you will get better responses if you keep the question short and to the point. — behzad.nouri, May 30 '16 at 14:35
If you want to add a wall of text then use `TL;DR` as I find it helps; e.g. [How to add LanguagePrimitives.GenericZero / get_Zero to System.String?](http://stackoverflow.com/questions/37117089/how-to-add-languageprimitives-genericzero-get-zero-to-system-string) I also find that if I talk first person it probably goes into `TL;DR` and if it is third person probably goes in the actual question. — Guy Coder, May 30 '16 at 16:06

score 75 · Accepted Answer · edited May 23 '17 at 11:46

75

If you switch to ByteString and stick with plain Haskell lists (instead of vectors) you will get a more efficient solution. You may also rewrite the solve function with a single left fold and bypass zip and right scan⁽¹⁾. Overall, on my machine, I get 20 times performance improvement compared to your Haskell solution⁽²⁾.

Below Haskell code performs faster than the F# code:

import Data.List (unfoldr)
import Control.Applicative ((<$>))
import Control.Monad (replicateM_)
import Data.ByteString (ByteString)
import qualified Data.ByteString as B
import qualified Data.ByteString.Char8 as C

parse :: ByteString -> [Int]
parse = unfoldr $ C.readInt . C.dropWhile (== ' ')

solve :: [Int] -> Int
solve xs = foldl go (const 0) xs minBound
    where go f x s = if s < x then f x else s - x + f s

main = do
    [n] <- parse <$> B.getLine
    replicateM_ n $ B.getLine >> B.getLine >>= print . solve . parse

_{1. See edits for an earlier version of this answer which implements solve using zip and scanr.}
_{2. HackerRank website shows even a larger performance improvement.}

edited May 23 '17 at 11:46

Community

1
1

answered May 30 '16 at 13:58

behzad.nouri

74,723
18
126
124

2

I understand that String functions are supposed to be slow, but are Text functions supposed to be slow as well? – Marko Grdinić May 30 '16 at 14:04
1

@MarkoGrdinic with ascii only text, my _guess_ is that `ByteString.Char8` performs faster than `Text`; but rely on your own benchmarks. – behzad.nouri May 30 '16 at 14:11
4

You can make this about five times as fast - 0.02 rather than 0.1 - if you revert to using unboxed vectors for the main `solve` function, now that the problem is well-analysed sprunge.us/PUYW – Michael May 30 '16 at 19:41
@Michael the code has been improved and that 0.1 second timing no longer holds – behzad.nouri May 31 '16 at 10:11
Could you please help me understand the particular construction of `foldl` in your `solve` function? I am used to thinking of `foldl` as taking the arguments: `(\accumulator element -> something) accumulator list` (as in `foldl (+) 0 [1,2,3]`). It looks like `go (const 0)` needs two more arguments, which would qualify it as the function argument for `foldl`, but if that's so, I'm confused about where is the accumulator passed, and why is there an extra argument after the list? – גלעד ברקן May 31 '16 at 16:52
2

Realize that in the normal type of foldl :: Foldable t => (b -> a -> b) -> b -> t a -> b, "b" could be the type "c -> d", in which case the type would be (with extraneous parens removed) Foldable t => ((c ->d) -> a -> c ->d) -> (c -> d) -> t a -> c -> d, which matches the shape of the arguments he is passing. In other words, his accumulator is a function that is changed as he runs down the list, starting as a (const 0). – David McHealy May 31 '16 at 18:12
1

@גלעדברקן There was a [different question](http://stackoverflow.com/questions/37501967/how-to-make-fromlist-lazy-in-this-dynamic-programming-example) a few days ago to which Behzad wrote the answer in a similar style and I spent a long time analyzing the usage of lambda as an accumulator. Personally, I do think it is confusing and it would make more sense to use tuples instead to accumulate extra values. Understanding it also requires understanding the basics of lambda calculus. – Marko Grdinić Jun 01 '16 at 06:20
@mindreader finally got it; working out how `go` is stacking function calls is cool but a little cumbersome; I hope to develop some intuition about it; also I wonder if that kind of function stack is less efficient for large data than something like the simple backwards loop by Harrop, which of course does not detract from its conceptual awesomeness! – גלעד ברקן Jun 02 '16 at 04:50

score 53 · Answer 2 · edited Jun 02 '16 at 02:36

53

If I wanted to do that quickly in F# I would avoid all of the higher-order functions inside solve and just write a C-style imperative loop:

let solve (ar : uint64[]) =
  let mutable sr, m = 0UL, 0UL
  for i in ar.Length-1 .. -1 .. 0 do
    let p = ar.[i]
    m <- max p m
    sr <- sr + m - p
  sr

According to my measurements, this is 11x faster than your F#.

Then the performance is limited by the IO layer (unicode parsing) and string splitting. This can be optimised by reading into a byte buffer and writing the lexer by hand:

let buf = Array.create 65536 0uy
let mutable idx = 0
let mutable length = 0

do
  use stream = System.Console.OpenStandardInput()
  let rec read m =
    let c =
      if idx < length then
        idx <- idx + 1
      else
        length <- stream.Read(buf, 0, buf.Length)
        idx <- 1
      buf.[idx-1]
    if length > 0 && '0'B <= c && c <= '9'B then
      read (10UL * m + uint64(c - '0'B))
    else
      m
  let read() = read 0UL
  for _ in 1UL .. read() do
    Array.init (read() |> int) (fun _ -> read())
    |> solve
    |> System.Console.WriteLine

edited Jun 02 '16 at 02:36

ildjarn

62,044
9
127
211

answered May 30 '16 at 18:36

J D

48,105
13
171
274

1

According to [HackerRank](https://www.hackerrank.com/challenges/stockmax), it is slightly faster, 0.14 rather than 0.17 – Michael May 30 '16 at 19:26
15

As soon as I saw a question featuring the words F# and Haskell, I began to scan the page for Dr. Harrop's response! – Shredderroy May 30 '16 at 19:32
2

FYI; because of an issue in the F# compiler there's an extra 5% to gain by writing this as a tail-recursive function that loops towards 0. – Just another metaprogrammer May 30 '16 at 20:24
4

@Michael: My program takes ~0.000001s to run on this machine so HackerRank is measuring JIT compile time + run time for F# vs just run time for Haskell. – J D May 30 '16 at 21:13
1

The HackerRank site clearly distinguishes compilation from run time for the F# program, just enter it into the text box, click the compile button and watch. The fastest variant Haskell program mentioned on this page sprunge.us/PUYW appears to be 7 times as fast as this F#. – Michael May 30 '16 at 22:59
1

Note that the whole difference between the initial Haskell and F# programs was due to the absurdly elaborate input handling in the Haskell program. It was using methods appropriate for a mixture of Ancient Greek and Thai text, not digits. Note also that in your program, the "input" is compiled together with `solve` etc. which, as we know ... makes numerous optimizations possible. You are calculating a constant. – Michael May 30 '16 at 23:05
@Michael: "Note that the whole difference between the initial Haskell and F# programs was due to the absurdly elaborate input handling in the Haskell program. It was using methods appropriate for a mixture of Ancient Greek and Thai text, not digits". On the contrary, both original programs used Unicode. Only the Haskell has been optimised to operate directly on bytes. – J D May 30 '16 at 23:19
Ah, yes, good point; I was going by confused memory while making my main point about computing constants - which, for good reason never happens in benchmarking compiled programs. Certainly the present form of input handling is not taking any time to speak of, since it is possible to speed up @behzad.nouri 's program 5x just by using `Data.Vector.Unboxed` rather than `Data.List` for `solve` as in http://sprunge.us/PUYW – Michael May 30 '16 at 23:38
1

This rectifies the unicode business without intruding on behzad nouri's structure http://sprunge.us/aOKI . It is three times as fast as your fsharp .15 v .05 and just uses dumb preludish higher order functions for `solve` – Michael May 31 '16 at 00:09
I've updated my F# code to do the equivalent and both the F# and Haskell now take 0.02s. – J D May 31 '16 at 00:11
Good. The input management was more confusing than it seemed. So now the question is whether the Haskell would be improved by using your approach, using a mutable vector and strefs. – Michael May 31 '16 at 00:14
1

@Michael, `STRef`s representing primitive values tend to suck, especially if they're modified a lot. You're better off using `StateT v (ST s)`, or faking `STRefs` using mutable unboxed vectors/arrays (there's a package for that somewhere). – dfeuer May 31 '16 at 00:18
8

Not sure how "write it like C" is a legitimate response. Of course it will be faster, but the whole point of using F# or Haskell is so I don't have to write like C. – Elliot Cameron May 31 '16 at 14:15
7

@3noch: As Yaron Minsky famously said "don't be puritanical about purity". – J D May 31 '16 at 16:47
3

@3noch: Sure, that's the whole point of F# or Haskell, but asking a question "How can I make it faster" about reasonably fast code **is** inevitably asking "How can I make it more like C". How's that not a legitimate response? – scrwtp Jun 01 '16 at 07:43
2

@scrwtp Well the question was about how to make two very non-C-like implementations have similar performance characteristics, not how to turn either of them into C. – Elliot Cameron Jun 02 '16 at 13:58

score 45 · Answer 3 · answered May 30 '16 at 15:20

45

Just for the record, the F# version is also not optimal. I don't think it really matters at this point, but if people wanted to compare the performance, then it is worth noting that it can be made faster.

I have not tried very hard (you can certainly make it even faster by using restricted mutation, which would not be against the nature of F#), but simple change to use Seq instead of Array in the right places (to avoid allocating temporary arrays) makes the code about 2x to 3x faster:

let solve (ar : uint64[]) =
    let ar' = Seq.zip ar (Array.scanBack max ar 0UL)    
    let go sr (p,m) = sr + m - p
    Seq.fold go 0UL ar'

If you use Seq.zip, you can also drop the take call (because Seq.zip truncates the sequence automatically). Measured using #time using the following snippet:

let rnd = Random()
let inp = Array.init 100000 (fun _ -> uint64 (rnd.Next()))
for a in 0 .. 10 do ignore (solve inp) // Measure this line

I get around 150ms for the original code and something between 50-75ms using the new version.

answered May 30 '16 at 15:20

Tomas Petricek

240,744
19
378
553

3

@JonHarrop I was hoping you'll join the game :-). Don't worry about the downvotes, the whole question will probably get deleted soon because it "does not fit the Q&A format" or something. – Tomas Petricek May 31 '16 at 01:27
4

@AdamCopley: In all fairness no it shouldn't. – user541686 May 31 '16 at 07:33
4

The op has asked for a review of his working code with regard to performance considerations, and how to write more efficient Haskell code. They key being it is working code. How is this not a review request? – Adam Copley May 31 '16 at 07:36
3

@AdamCopley OP has asked about performance differences between Haskell and F#. Sample snippets of code in each language, proposed to be more or less equivalent, were provided. In benchmarking the sample code is critical, so obviously the samples will need to be discussed, tweaked, or replaced. But the question is not about the code and is not a request for code review. It would be totally off topic there. – Yitz May 31 '16 at 10:35
3

If an asker of a question asks how a snippet of working code can be improved, that to me is a review request, plain and simple, no arguments. – Adam Copley May 31 '16 at 11:37

Why is the F# version of this program 6x faster than the Haskell one?

3 Answers3

Linked