1

This is a follow up question to Haskell Thrift library 300x slower than C++ in performance test.

So recently I decided I was going to play around with Thrift and noticed that it was almost prohibitively slow and I stumbled upon the post mentioned above so I decide to deep dive :) The original question had some suggestions for improving the poster's code but there wasn't much in terms of suggestions for improving the library code. I'm not incredibly experienced with Haskell but thought I would give it a try.

I got the source code for Thrift (https://github.com/apache/thrift/tree/master/lib/hs) and rebuilt it with profiling enabled so I could get an idea of how the library was performing... in a single word, poorly, but that may be too generous. I decided to just focus on the ping test since that appeared to be the biggest offender. Here's the profiler output (with a few cost centered added to break down the expensive method).

First, here's how I built everything to get the times (and with profiling added to get the numbers below):

  • GHC Version: 7.8.4
  • Thrift Version: 0.9.2 (cloned from GitHub repo)

Building Thrift:

runghc Setup.lhs configure --user -O2 && runghc Setup.lhs build && runghc Setup.lhs install

Building Main:

ghc Main -O2

Times:

  • Haskell: 3960ms
  • C++: 5ms

... ouch :(

Here's the profiler output (with a few cost centered added to break down the expensive method).

COST CENTRE                            MODULE                         %time %alloc

SCC_readMessageBegin                   ItemStore                       24.0   29.8
SCC_writeMessageBegin                  ItemStore                       24.0   18.9
MAIN                                   MAIN                            20.0    2.7
SCC_THRIFT_parseBinaryValue_ver        Thrift.Protocol.Binary           8.0    4.3
CAF                                    GHC.Event.Thread                 4.0    0.0
SCC_THRIFT_parseBinaryValue_s          Thrift.Protocol.Binary           4.0    9.3
SCC_THRIFT_parseBinaryValue_decodeUtf8 Thrift.Protocol.Binary           4.0    1.5
proc_                                  ItemStore                        4.0    1.9
read_Ping_args                         ItemStore                        4.0    3.3
main                                   Main                             4.0    1.3
unstreamChunks/inner                   Data.Text.Internal.Lazy.Fusion   0.0    1.6
SCC_THRIFT_parseBinaryValue_sz         Thrift.Protocol.Binary           0.0    5.0
process_ping                           ItemStore                        0.0    2.1
write_Ping_result                      ItemStore                        0.0   14.2

COST CENTRE                                   MODULE                           no.     entries  %time %alloc   %time %alloc

MAIN                                          MAIN                             124           0   20.0    2.7   100.0  100.0
 main                                         Main                             250           0    0.0    0.0    72.0   95.0
  process                                     ItemStore                        251           1    0.0    0.8    72.0   94.9
   SCC_readMessageEnd                         ItemStore                        282         100    0.0    0.1     0.0    0.1
   SCC_proc_                                  ItemStore                        261         100    0.0    0.1    32.0   44.1
    proc_                                     ItemStore                        262         100    4.0    1.9    32.0   44.0
     process_ping                             ItemStore                        268         100    0.0    2.1    28.0   40.8
      SCC_tFlush                              ItemStore                        281         100    0.0    0.7     0.0    0.7
      SCC_writeMessageEnd                     ItemStore                        280         100    0.0    0.1     0.0    0.1
      SCC_write_Ping_result                   ItemStore                        277         100    0.0    0.0     0.0   14.2
       write_Ping_result                      ItemStore                        278         100    0.0   14.2     0.0   14.2
        from_Ping_result                      ItemStore                        279         100    0.0    0.0     0.0    0.0
      SCC_writeMessageBegin                   ItemStore                        273         100   24.0   18.9    24.0   20.2
       unstreamChunks/outer                   Data.Text.Internal.Lazy.Fusion   274         100    0.0    0.1     0.0    1.3
        unstreamChunks/inner                  Data.Text.Internal.Lazy.Fusion   275         400    0.0    0.8     0.0    1.2
         unstreamChunks/resize                Data.Text.Internal.Lazy.Fusion   276         100    0.0    0.4     0.0    0.4
      SCC_Iface.ping                          ItemStore                        271         100    0.0    0.0     0.0    0.1
       ping                                   Main                             272         100    0.0    0.1     0.0    0.1
      SCC_read_Ping_args                      ItemStore                        269         100    0.0    0.1     4.0    3.3
       read_Ping_args                         ItemStore                        270         100    4.0    3.3     4.0    3.3
     unstreamChunks/outer                     Data.Text.Internal.Lazy.Fusion   263         100    0.0    0.1     0.0    1.3
      unstreamChunks/inner                    Data.Text.Internal.Lazy.Fusion   264         400    0.0    0.8     0.0    1.2
       unstreamChunks/resize                  Data.Text.Internal.Lazy.Fusion   265         100    0.0    0.4     0.0    0.4
   SCC_readMessageBegin                       ItemStore                        252           1   24.0   29.8    40.0   49.9
    SCC_THRIFT_parseBinaryValue_ver           Thrift.Protocol.Binary           254           0    8.0    4.3    16.0   20.1
     SCC_THRIFT_parseBinaryValue_s            Thrift.Protocol.Binary           255         100    4.0    9.3     8.0   15.8
      SCC_THRIFT_parseBinaryValue_decodeUtf8  Thrift.Protocol.Binary           260           0    4.0    1.5     4.0    1.5
      SCC_THRIFT_parseBinaryValue_sz          Thrift.Protocol.Binary           256         100    0.0    5.0     0.0    5.0
       SCC_THRIFT_parseBinaryValue_decodeUtf8 Thrift.Protocol.Binary           259         100    0.0    0.0     0.0    0.0
       SCC_THRIFT_parseBinaryValue_ver        Thrift.Protocol.Binary           257         100    0.0    0.0     0.0    0.0

Yikes. You'll see the largest offenders (readMessageBegin (containing SCC_THRIFT_parseBinaryVal_XYZ) and writeMessageBegin) constitute 64% of the total time. All that these methods do is read 2 ints and a string!. This is clearly unacceptable.

If you look at Thrift/Protocol.hs (https://github.com/apache/thrift/blob/master/lib/hs/src/Thrift/Protocol.hs) and Thrift/Protocol/Binary.hs (https://github.com/apache/thrift/ blob/master/lib/hs/src/Thrift/Protocol/Binary.hs) you'll notice they are using Attoparsec to parse the binary header which consists of version :: Int32, a string, and the message size :: Int32.

First off it's strange (is it?) to use Attoparsec to read data that you already know the exact structure of. Second, in Protocol.hs, runParser reads from whatever transport protocol is being used 1 byte at a time!! and uses the Partial feature of Attoparsec and continuously reapplies the parser. Additionally, it appears they are going from a lazy ByteString to a strict ByteString then back to a lazy ByteString several times per parse. None of these things can be efficient.

I tried to do some modifications such as reading more than 1 byte at a time (but that just caused it to hang), to use tReadAll (from Transport.hs) to read the entire contents from the handle, but that hangs as well. I also tried a few other things (such as using Data.Binary directly) that ended up causing exceptions to be thrown about "no more data to read" and other such things.

Is there a fundamental reason why the Trift code is performing so poorly and what is the more idiomatic and efficient way to do this? Appologies for the long question, I just wanted to make sure there was enough context and I'm really interested in making libraries like this for Haskell better.

Community
  • 1
  • 1
m-renaud
  • 1,275
  • 8
  • 11
  • Isn't thrift a binary protocol? That's sounds like a job for cereal or binary, depending whether you're using strict or lazy bytestrings. [Here's a recent benchmark of parsing mp4, compared with C and Rust](http://np.reddit.com/r/haskell/comments/2z2f6f/benchmark_rustnom_vs_haskellattoparsec_vs_chammer/), [here are the results](https://github.com/Codas/nom_benchmarks/tree/dev). – Zeta Mar 25 '15 at 08:46
  • That being said, try to format your question so that the essence (*"Is there a fundamental reason why [thrift] is performing so poorly and what is the more idiomatic and efficient way to do [thrift-compatible network messaging]?"*) stands out. – Zeta Mar 25 '15 at 08:48
  • 1
    Also, you may be able to get more of a discussion going on one of [the Thrift project's mailing lists](http://thrift.apache.org/mailing). –  Mar 25 '15 at 10:33
  • Did you try using `Transport.Framed`? Reading 1 byte at a time shouldn't be too much of a problem if the whole communication is buffered. – firefrorefiddle Mar 25 '15 at 11:01
  • 1
    Last not least, we accept patches :-) – JensG Mar 25 '15 at 16:04
  • @Zeta - I'm using the Binary protocol here to get the numbers (updated the post to include them). I made a change where I used the Binary.Get monad to parse it but couldn't get it to function properly. If I tried to read all the contens into a lazy bytestring it would just hang, and if I used the method they use in runParser (i.e. read a byte, run the parser on it to get a partial parse, read another byte, repeat) it wouldn't work because Binary just fails if there isn't enough input to perform the read. – m-renaud Mar 25 '15 at 17:22

0 Answers0