6

Mind the following Haskell program:

-- (^) allocs memory so we define it using the native (**)
pow :: Int -> Int -> Int
pow x y = floor $ fromIntegral x ** fromIntegral y

-- tail recursive, div and mod are native, I believe, so, no alloc
isPalindrome :: Int -> Bool
isPalindrome x = go (pow 10 (floor $ logBase 10 $ fromIntegral x)) x
    where go m x = x <= 0 || div x m == mod x 10 && go (div m 100) (div (x - m * mod x 10) 10)

-- go is tail recursive too... no obvious allocation here
wanderer :: Int -> Int
wanderer n = go 0 (pow 10 n - 1) (pow 10 n - 1) where
    go p x y | p > 0 && div p x >= pow 10 n  = p                
    go p x y | p > 0 && y < div p x || y < x = go p (x-1) (pow 10 n - 1)
    go p x y | isPalindrome (x*y)            = go (x*y) x (y-1) 
    go p x y                                 = go p x (y-1)     

main = print . wanderer $ 8

Profiling it, we get:

Fri May  8 03:36 2015 Time and Allocation Profiling Report  (Final)

       aff +RTS -p -RTS

    total time  =        7.34 secs   (7344 ticks @ 1000 us, 1 processor)
    total alloc = 6,487,919,472 bytes  (excludes profiling overheads)

COST CENTRE     MODULE    %time %alloc

isPalindrome    Main       41.9   18.5
isPalindrome.go Main       22.6    1.4
wanderer.go     Main       20.0   67.8
pow             Main       15.5   12.3


                                                                individual     inherited
COST CENTRE           MODULE                  no.     entries  %time %alloc   %time %alloc

MAIN                  MAIN                     47           0    0.0    0.0   100.0  100.0
 main                 Main                     95           0    0.0    0.0     0.0    0.0
 CAF:main1            Main                     92           0    0.0    0.0     0.0    0.0
  main                Main                     94           1    0.0    0.0     0.0    0.0
 CAF:main2            Main                     91           0    0.0    0.0   100.0  100.0
  main                Main                     96           0    0.0    0.0   100.0  100.0
   wanderer           Main                     98           1    0.0    0.0   100.0  100.0
    pow               Main                    101           1    0.0    0.0     0.0    0.0
    wanderer.go       Main                     99    49995002   20.0   67.8   100.0  100.0
     isPalindrome     Main                    102    49985002   41.9   18.5    80.0   32.2
      pow             Main                    104    49985002   15.5   12.3    15.5   12.3
      isPalindrome.go Main                    103    52207117   22.6    1.4    22.6    1.4
     pow              Main                    100           1    0.0    0.0     0.0    0.0
   pow                Main                     97           2    0.0    0.0     0.0    0.0
 CAF                  GHC.Conc.Signal          85           0    0.0    0.0     0.0    0.0
 CAF                  GHC.IO.Encoding          78           0    0.0    0.0     0.0    0.0
 CAF                  GHC.IO.Encoding.Iconv    76           0    0.0    0.0     0.0    0.0
 CAF                  GHC.IO.Handle.FD         69           0    0.0    0.0     0.0    0.0
 CAF                  GHC.Event.Thread         55           0    0.0    0.0     0.0    0.0

As far as I'm aware, it seems like all my functions are tail recursive and those prelude functions are asm operations. Yet this simple program allocs 7gb of memory. Where is all the allocation coming from?

MaiaVictor
  • 51,090
  • 44
  • 144
  • 286
  • It's a bit hard to read what's going on in `ref` and `go` but it will most likely be lot's of thunks (sometimes *tail-recursion* is not the best with Haskell) - I would try to make those functions strict (should be easy with [BangPatterns](https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/bang-patterns.html)) – Random Dev May 08 '15 at 07:04

2 Answers2

14

The allocation is coming from the go in isPalindrome:

go m x = x <= 0 || div x m == mod x 10 && go (div m 100) (div (x - m * mod x 10) 10)

We have a || on the right hand side. The short-circuit semantics of || is realized through lazy evaluation. GHC sees that the m argument is unused if x <= 0 evaluates to True, so it doesn't unbox m, allowing it to remain uncomputed. Of course, in this case we're better off unboxing m too, so let's do that.

{-# LANGUAGE BangPatterns #-}
go !m x = ...

Now with ghc -O2 and +RTS -s:

      52,016 bytes allocated in the heap
       3,408 bytes copied during GC
      44,312 bytes maximum residency (1 sample(s))
      17,128 bytes maximum slop
           1 MB total memory in use (0 MB lost due to fragmentation)
András Kovács
  • 29,931
  • 3
  • 53
  • 99
  • I find it very strange that the run time didn't change much after that. So allocating 6gb of memory costed no time at all? – MaiaVictor May 08 '15 at 16:00
  • Well, allocation is pretty cheap with GHC. It seems be relatively cheap in comparison to the non-allocating computation here. I optimized your code to about [3.5x speed](http://lpaste.net/132236), and in that case the allocating version of the code was ~20% slower, so it doesn't cost nothing. – András Kovács May 08 '15 at 17:39
  • Wow, very cool, thank you. Is there any reason not to replace Prelude's `(^)` and `even` with yours? – MaiaVictor May 09 '15 at 11:32
  • I'm pretty sure the functions are slow in `base` because `even` fails to inline in ghc-7.10. If you compile with ghc 7.8.4 with `base`-exported `^` and `even`, you still get no heap allocation and good speed. I found this out when I wrote [this answer](http://stackoverflow.com/questions/29875886/haskell-why-does-int-performs-worse-than-word64-and-why-my-program-is-far-slow/29879538#29879538). So the issue is not with `base`. Although one could argue that `base` should define `even` as I did, because without LLVM we don't get the bitwise check in the machine code. – András Kovács May 09 '15 at 12:50
  • `52,016 bytes allocated in the heap`. How come this value is so low here? I usually see a high value (in GBs) even if `total memory in use` is less than `10m`. – Nawaz Jul 14 '20 at 23:14
2

GHC has sometimes been termed "evaluation by allocation". What's more important is whether or not the memory is retained. Just eyeballing the code, I don't see any obvious space leaks.

Have you looked at the runtime stats? You don't even need profiling libraries for that. What is of greater interest here is the GC bytes copied, and maximum residency... both of which I would guess to be comparatively very small.

Beyond the first-order concerns, most of those allocations are probably closures. Are you compiling ghc -O in order to run the strictness analyzer? Even if you are, I'm not sure that the strictness analyzer would be able to eliminate closure allocation for the m argument in the first go function or the x and y arguments in the second go. My guess is that bang patterns would greatly reduce your allocations here.

The practical consequences of tail recursion are very different in Haskell than a strict language; so if you are familiar with tail calls in a strict language, your intuitions are probably off.

lpsmith
  • 707
  • 3
  • 8