46

So sometimes I need to write a data structure I can't find on Hackage, or what I find isn't tested or quality enough for me to trust, or it's just something I don't want to be a dependency. I am reading Okasaki's book right now, and it's quite good at explaining how to design asymptotically fast data structures.

However, I am working specifically with GHC. Constant factors are a big deal for my applications. Memory usage is also a big deal for me. So I have questions specifically about GHC.

In particular

  • How to maximize sharing of nodes
  • How to reduce memory footprint
  • How to avoid space leaks due to improper strictness/laziness
  • How to get GHC to produce tight inner loops for important sections of code

I've looked around various places on the web, and I have a vague idea of how to work with GHC, for example, looking at core output, using UNPACK pragmas, and the like. But I'm not sure I get it.

So I popped open my favorite data structures library, containers, and looked at the Data.Sequence module. I can't say I understand a lot of what they're doing to make Seq fast.

The first thing that catches my eye is the definition of FingerTree a. I assume that's just me being unfamiliar with finger trees though. The second thing that catches my eye is all the SPECIALIZE pragmas. I have no idea what's going on here, and I'm very curious, as these are littered all over the code.

Many functions also have an INLINE pragma associated with them. I can guess what that means, but how do I make a judgement call on when to INLINE functions?

Things get really interesting around line ~475, a section headered as 'Applicative Construction'. They define a newtype wrapper to represent the Identity monad, they write their own copy of the strict state monad, and they have a function defined called applicativeTree which, apparently is specialized to the Identity monad and this increases sharing of the output of the function. I have no idea what's going on here. What sorcery is being used to increase sharing?

Anyway, I'm not sure there's much to learn from Data.Sequence. Are there other 'model programs' I can read to gain wisdom? I'd really like to know how to soup up my data structures when I really need them to go faster. One thing in particular is writing data structures that make fusion easy, and how to go about writing good fusion rules.

danharaj
  • 1,613
  • 14
  • 19
  • 21
    If memory serves me, the canonical answer to this question is "get your problem added to [the language benchmarks game](http://shootout.alioth.debian.org/) and wait for Don to optimize it for you". ;] – C. A. McCann Jun 21 '11 at 21:45
  • hahaha. Maybe I should eat Dons, thus gaining his uncanny power to create Haskell code that competes with C. – danharaj Jun 21 '11 at 21:49

2 Answers2

41

That's a big topic! Most has been explained elsewhere, so I won't try to write a book chapter right here. Instead:

  • Real World Haskell, ch 25, "Performance" - discusses profiling, simple specialization and unpacking, reading Core, and some optimizations.

Johan Tibell is writing a lot on this topic:

And some things from here:

And some other things:

Community
  • 1
  • 1
Don Stewart
  • 137,316
  • 36
  • 365
  • 468
  • So is the specialize pragma analogous to your optimizations for data structures, but done automatically and for functions? – danharaj Jun 22 '11 at 01:41
  • Yeah. Though it is somewhat unnecessary given the presence of INLINE – Don Stewart Jun 22 '11 at 01:51
  • 7
    SPECIALIZE leads to less code bloat than INLINE as there's only one extra function body per type, rather than one extra function body per call site. My rule of thumb: Use INLINE on functions with higher-order arguments (to avoid an indirect call) and SPECIALIZE otherwise. With the addition of INLINABLE I use that almost exclusive where I would before have used SPECIALIZE, as INLINABLE allows for specialization in a different module. – tibbe Jun 22 '11 at 06:22
  • 2
    My "High-Performance Haskell" talk at CUFP contains a superset of the slides from "Reasoning about laziness". You can find it here: http://blog.johantibell.com/2010/09/slides-from-my-high-performance-haskell.html – tibbe Jun 22 '11 at 06:26
  • 1
    Edward Zhang had an [interesting post](http://blog.ezyang.com/2011/06/a-pattern-for-increasing-sharing/) on mapping over trees in such a way that you get maximal sharing when the function results would be identical. –  Jun 22 '11 at 11:15
13

applicativeTree is quite fancy, but mainly in a way which has to do with FingerTrees in particular, which are quite a fancy data structure themselves. We had some discussion of the intricacies over at cstheory. Note that applicativeTree is written to work over any Applicative. It just so happens that when it is specialized to Id then it can share nodes in a manner that it otherwise couldn't. You can work through the specialization yourself by inlining the Id methods and seeing what happens. Note that this specialization is used in only one place -- the O(log n) replicate function. The fact that the more general function specializes neatly to the constant case is a very clever bit of code reuse, but that's really all.

In general, Sequence teaches more about designing persistent data structures than about all the tricks for eeking out performance, I think. Dons' suggestions are of course excellent. I'd also just browse through the source of the really canonical and tuned libs -- Map, IntMap, Set, and IntSet in particular. Along with those, its worth taking a look at Milan's paper on his improvements to containers.

Community
  • 1
  • 1
sclv
  • 38,665
  • 7
  • 99
  • 204
  • 3
    And HashMap and HashSet, my new favorites. – augustss Jun 21 '11 at 23:48
  • @augustss: I agree, except when you browse the source, HashMap and HashSet are, as Milan's paper documents, pretty much thin (but excellent) wrappers around Maps and Sets. – sclv Jun 22 '11 at 00:02
  • 3
    Have you seen Johan's hashmap? http://hackage.haskell.org/packages/archive/unordered-containers/0.1.3.0/doc/html/src/Data-HashMap-Common.html#HashMap – Don Stewart Jun 22 '11 at 00:11
  • @Don -- oh, of course. Brain slip on my part there. – sclv Jun 22 '11 at 01:57
  • 1
    I wrote the `applicativeTree` implementation, back in the day -- and sclv is right that it's mostly there because it's clever. =P Yes, it does neatly generalize to make the replicate function O(log n), but it's not a trick that would work for most data structures. – Louis Wasserman Jan 20 '12 at 17:57