While this may sound as theoretical question, suppose I decide to invest and build a mission-critical application written in Haskell. A year later I find that I absolutely need to improve performance of some very thin bottleneck and this will require optimizing memory access close to raw machine capabilities.
Some assumptions:
- It isn't realtime system - occasional latency spikes are tolerable (from interrupts, thread scheduling irregularities, occasional GC etc.)
- It isn't a numeric problem - data layout and cache-friendly access patterns are most important (avoiding pointer chasing, reducing conditional jumps etc.)
- Code may be tied to specific GHC release (but no forking)
- Performance goal requires inplace modification of pre-allocated offheap arrays taking alignment into account (C strings, bit-packed fields etc.)
- Data is statically bounded in arrays and allocations are rarely if ever needed
What mechanisms does GHC offer to perfom this kind of optimization? By saying reliably I mean that if source change causes code to no longer perform, it is correctible in source code without rewriting it in assembly.
- Is it already possible using GHC-specific extensions and libraries?
- Would custom FFI help avoid C calling convention overhead?
- Could a special purpose compiler plugin do it through a restricted source DSL?
- Could source code generator from a "high-level" assembly (LLVM?) be solution?