2

I need some help understanding some GHC heap profiles. I'm working on a new output format for an existing compiler for the Grammatical Framework language.

Before

Profiling a normal run of the current version of the compiler looks like this:

heap profile for PGF

  13,347,660,488 bytes allocated in the heap
 213,062,924,208 bytes copied during GC
     740,585,528 bytes maximum residency (567 sample(s))
       4,844,112 bytes maximum slop
            1438 MiB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0     12312 colls,     0 par    1.321s   1.382s     0.0001s    0.0025s
  Gen  1       567 colls,     0 par   206.442s  208.534s     0.3678s    0.8388s

  INIT    time    0.001s  (  0.005s elapsed)
  MUT     time   68.757s  ( 68.986s elapsed)
  GC      time  109.557s  (111.086s elapsed)
  RP      time    0.000s  (  0.000s elapsed)
  PROF    time   98.206s  ( 98.830s elapsed)
  EXIT    time    0.000s  (  0.001s elapsed)
  Total   time  276.521s  (278.908s elapsed)

  %GC     time       0.0%  (0.0% elapsed)

  Alloc rate    194,128,534 bytes per MUT second

  Productivity  60.4% of total user, 60.2% of total elapsed

After

With the changes I've made, it now looks like this:

heap profile for LPGF

  15,996,529,552 bytes allocated in the heap
 216,727,207,600 bytes copied during GC
     889,791,520 bytes maximum residency (578 sample(s))
       5,417,952 bytes maximum slop
            1741 MiB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0     14915 colls,     0 par    1.641s   1.721s     0.0001s    0.0014s
  Gen  1       578 colls,     0 par   226.744s  230.323s     0.3985s    1.1982s

  INIT    time    0.001s  (  0.005s elapsed)
  MUT     time   70.046s  ( 70.242s elapsed)
  GC      time  120.183s  (122.695s elapsed)
  RP      time    0.000s  (  0.000s elapsed)
  PROF    time  108.201s  (109.349s elapsed)
  EXIT    time    0.000s  (  0.000s elapsed)
  Total   time  298.431s  (302.291s elapsed)

  %GC     time       0.0%  (0.0% elapsed)

  Alloc rate    228,370,682 bytes per MUT second

  Productivity  59.7% of total user, 59.4% of total elapsed

So this new version uses more time and memory than the previous one, and my goal is to reduce those as much as possible.

  • The first part of both graphs looks the same, this stage of the compiler (namely reading binary files) has not been touched and thus I would expect it to look identical.
  • After that, the graphs are hugely different. I feel like the huge cliff edge in graph #2 is telling me something, but I don't quite know what.
  • The blue area dominating the second half of graph #2 (24956) is my newly added code. As I haven't aggressively optimised that code, I would expect there to be scope for improvement there. But do these different graphs give any hints about laziness/strictness or where to investigate further?
John J. Camilleri
  • 4,171
  • 5
  • 31
  • 41
  • 1
    If you can *avoid* building a large structure, that will likely improve performance greatly. If you can reduce the size of the large structure (even by a constant factor) that will likely have a considerable impact. If you can store the large structure in a *compact region*, that'll be a big improvement. – dfeuer Aug 26 '21 at 01:05

1 Answers1

5

That profile shape indicates a large graph of values became unreachable suddenly. That could be a bunch of thunks produced accidentally. Or it could be some sort of intermediate structure that was evaluated and used properly, but eventually was no longer needed and released.

But the one thing you can say for sure is that it isn't the result of a mass of unevaluated thunks being built up and then traversed during evaluation. That has a shape like your first chart - an upward trend while allocating and then a downward trend while evaluating.

A cliff says that something big became suddenly unreachable. As for what? Who knows. We'd need to see code.

Carl
  • 26,500
  • 4
  • 65
  • 86
  • The code is open source [here](https://github.com/GrammaticalFramework/gf-core/tree/lpgf), though I fear a bit too large to explain in a question here. But your explanation matches my understanding of what's going on. Is one of these behaviours — gradual vs. sudden deallocation — somehow better or more desirable? Or does this chart not really reveal any obvious candidates for optimisation? Perhaps I should focus more on the profiling report rather than these heap charts? – John J. Camilleri Aug 25 '21 at 09:18
  • 3
    @JohnJ.Camilleri It's only an opportunity for optimization if it's not supposed to be happening - if it's *meant* to build up some big data structure, use it a bunch and then forget it, then as Carl says this is the expected shape of the graph. – user253751 Aug 25 '21 at 11:30