4

I have released a small numerical library for solving delay differential equations: http://github.com/masterdezign/dde

The main technical constraints:

  1. Make the library flexible with respect to the number of dynamical variables (x(t), y(t),...), i.e. a State of the dynamical system for a given moment of time.
  2. Easily integrate with libraries that exploit Data.Vector.Storable (e.g. hmatrix). Therefore, as input/output I extensively employ Data.Vector.Storable.

As a result, unlike in this solution: How do I optimize numerical integration performance in Haskell (with example), newtype State = State { _state :: V.Vector Double } is used instead of data State = State {-# UNPACK #-} !Double {-# UNPACK #-} !Double. However, the library runs twice slower now.

The question: is there any way to bring both the speed of data State = State {-# UNPACK #-}... and the flexibility of newtype State = State { _state :: V.Vector Double } for unspecified number of variables? Shall I consider Template Haskell to create data UNPACK-like structures during the compilation time?

penkovsky
  • 893
  • 11
  • 14
  • Which module(s) have you imported as `V`? Switching from boxed to unboxed vectors is very closely analogous to switching from `data State = State Double Double` to `data State = State {-# UNPACK #-} !Double {-# UNPACK #-} !Double`. – Daniel Wagner Mar 01 '18 at 14:02
  • To answer your question, ```import qualified Data.Vector.Storable as V```, see https://github.com/masterdezign/dde/blob/48cf99a109467c2515bf2a8f5d8fdfeaf1cca1fc/dde/Numeric/DDE.hs#L71 – penkovsky Mar 01 '18 at 14:54

1 Answers1

2

I wouldn't use any particular vector implementation. Variable-length types like Data.Vector are a bad choice not only because the extra length information is a considerable overhead when the dimension of the space is low, also because you lose any type-system guarantee that the dimensions match.

Instead, you should make everything parametric on the choice of vector space. I.e, you effectively make the dimension a compile-time variable, plus you allow vector types with some meaningfully-named subvariables.

import Data.VectorSpace
import Data.AdditiveGroup

newtype Stepper1 state = Stepper1 {
  _stepper
    ::  Double
    -> RHS state   -- parameterised in a similar way
    -> state
    -> (Double, Double)
    -> (Double, Double)
    -> state
  }

rk₄ :: VectorSpace v => Stepper1 v
rk₄ = Stepper1 _rk4
 where _rk4 hStep rhs' y₀ ... = y₀ ^+^ (h/6)*^(k₁ ^+^ 2*^k₂ ^+^ 2*^k₃ ^+^ k₄)
         where k₁ = rhs' (y₀, ...)
               k₂ = rhs' (y₀ ^+^ (h/2)*^k₁, ...)
               k₃ = rhs' (y₀ ^+^ (h/2)*^k₂, ...)
               k₄ = rhs' (y₀ ^+^ h*^k₃, ...)

Then, what particular implementation it will be can be chosen by the user. For a 2-dimensional vector, a standard choice is V2 from the linear package; it is re-exported with VectorSpace instances from free-vector-spaces. But for testing, you could also just use plain old tuples, which have a VectorSpace instance right in vector-space. Of course, wrapping around the types from hmatrix would also be possible, but this would really not be good for performance – better just convert the final results, if necessary.

To get the best performance, you may need to use some {-# INLINE #-} pragmas. Bang patterns OTOH don't usually bring much performance advantage – most important is that the types are strict, and unboxed. It's certainly no use to pre-emptively put a bang before every variable definition – those won't have any effect anyway, because a CAF is only evaluated when it's used.

I would be excited to hear about the performance you get in the end! If it's notably worse than your original State {-# UNPACK #-} !Double {-# UNPACK #-} !Double, that's something we should investigate.

leftaroundabout
  • 117,950
  • 5
  • 174
  • 319
  • Thank you for the elegant answer! I will definitely take it into account. – penkovsky Mar 01 '18 at 16:43
  • It took me a while to implement this: Haskell type system appears to be quite complicated. So here is the new version https://github.com/masterdezign/dde/tree/dev. The code is more elegant now, however there is no visible speedup. The simplest way to compare is `stack build && time stack exec mackey-glass > /dev/null` – penkovsky Mar 20 '18 at 19:45
  • Moreover, due to some reason the results of master and dev branch are not identical. I am curious to know why. – penkovsky Mar 20 '18 at 19:47
  • First, the results differ slightly and then, naturally, they diverge. But not more than 1e-13. Is it a double precision rounding problem because of a different evaluation order? – penkovsky Mar 20 '18 at 19:54
  • I'll look into it; what need I execute to see the behaviour you're describing? – leftaroundabout Mar 20 '18 at 19:57
  • For comparison, `git checkout master && stack build && time stack exec mackey-glass > v1`, `git checkout dev && stack build && time stack exec mackey-glass > v2`. However, that minor differences are not critical. I would appreciate any advice regarding computing speed. Thanks! – penkovsky Mar 20 '18 at 20:40
  • Yeah, those differences are definitely in the range you need always expect when doing numerical stuff with floating-point numbers. — As for performance – I get `real 0m0.382s` with the `master` version, but only `real 0m0.092s` with the dev version. Are you specifically unhappy with that, or do you just generally ask for ideas what might still be improvable? – leftaroundabout Mar 20 '18 at 20:55
  • On my machine, average (over 10 runs) times almost do not differ (even though there is a slight improvement). Do you use default compilation settings or some custom ghc flags? Have you tried to run several times for statistics? – penkovsky Mar 20 '18 at 20:57
  • I may need to use Criterion for precision, but let me find the older (faster) version, for completeness. – penkovsky Mar 20 '18 at 21:03
  • Ah, I had the terminal output on with the `master` version, that made it so slow. Frankly, even when piping to a file I'm dubious that that the timing has much to say about numerical performance – you'll mostly be timing output-formatting performance. Yeah, criterion would make timings much clearer. Or, just make the test _much longer but with only every 100th step shown_, or something like that. – leftaroundabout Mar 20 '18 at 21:05
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/167211/discussion-between-penkovsky-and-leftaroundabout). – penkovsky Mar 20 '18 at 21:09
  • For the reference, here is the fast version https://raw.githubusercontent.com/masterdezign/aivika-experiment-dde-test/master/src/MackeyGlass.lhs. To compile, `cabal exec ghc -- -O2 MackeyGlass.lhs`. – penkovsky Mar 20 '18 at 21:32