64 bit F# / FSI: Limitation of very large single file function

Question

I have some chemical models, which are based on sparse matrices of some coefficients. So, given the model parameters, I generate F# code based on only non-zero elements of these coefficients. The generated model is then fed to ALGLIB (http://www.alglib.net/) ODE solver. The matrices of coefficients are about 99.9% to 99.99% sparse, which means that only 0.01% to 0.1% of coefficients are not exact zeros. Below is a very simplified example of how a generated F# model file looks like. It is the function update (x : array<double>) : array<double> that is fed to ALGLIB ODE solver using 64-bit FSI.

Now, the ALGLIB ODE solver is perfectly capable of handling at least 1M variables for a simple input function. I've tested that and it works with no issues. I have under 10K variables for a typical model.

However, when I increase the model size, I start getting stack overflow exception at run time: the model with about 100K LOC works fine, but the model with about 150K LOC fails with stack overflow exception.

I am guessing that this is related to how initialization / processing of large "hard-coded" arrays is handled and I wonder how should I tweak the generated code OR how can I increase the stack size for FSI and/or F# 64-bit program, let's say to 1 GB.

I shall stress that this is not a typical recursive function stack overflow issue, but just the overall size of the model, which causes the problem.

If you look at update function, you will see that it has a generated array, each element of which is produced by taking another array and applying |> Array.sum. This becomes huge for a large model and I am guessing that that could be causing a stack overflow.

Thanks a lot!

PS Below is a very simplified example of the model. It does have all the necessary structures that appear in the real model.

namespace Model

open Clm.Substances
open Clm.Model

open Clm.ReactionTypes

module ModelData = 
    let seedValue = 123456
    let numberOfAminoAcids = NumberOfAminoAcids.OneAminoAcid
    let maxPeptideLength = MaxPeptideLength.TwoMax
    let numberOfSubstances = 7

    let aminoAcids = AminoAcid.getAminoAcids numberOfAminoAcids
    let chiralAminoAcids = ChiralAminoAcid.getAminoAcids numberOfAminoAcids
    let peptides = Peptide.getPeptides maxPeptideLength numberOfAminoAcids

    let allSubst = 
        [ Substance.food ]
        @
        (chiralAminoAcids |> List.map (fun a -> Chiral a))
        @
        (peptides |> List.map (fun p -> PeptideChain p))

    let allInd = allSubst |> List.mapi (fun i s -> (s, i)) |> Map.ofList


    let getTotalSubst (x : array<double>) = 
        [|
            x.[0] // Y
            x.[1] // A
            x.[2] // a
            2.0 * x.[3] // AA
            2.0 * x.[4] // Aa
            2.0 * x.[5] // aA
            2.0 * x.[6] // aa
        |]
         |> Array.sum


    let getTotals (x : array<double>) = 
        [|
            // A
            (
                [|
                    x.[1] // A
                    2.0 * x.[3] // AA
                    x.[4] // Aa
                    x.[5] // aA
                |]
                |> Array.sum
                ,
                [|
                    x.[2] // a
                    x.[4] // Aa
                    x.[5] // aA
                    2.0 * x.[6] // aa
                |]
                |> Array.sum
            )
        |]

    let update (x : array<double>) : array<double> = 
        let xSum = (x |> Array.sum) - x.[0]

        let xSumN = 
            [|
                1.0 * x.[1] // A
                1.0 * x.[2] // a
                2.0 * x.[3] // AA
                2.0 * x.[4] // Aa
                2.0 * x.[5] // aA
                2.0 * x.[6] // aa
            |]
            |> Array.sum

        let xSumSquaredN = 
            [|
                1.0 * x.[1] * x.[1] // A
                1.0 * x.[2] * x.[2] // a
                2.0 * x.[3] * x.[3] // AA
                2.0 * x.[4] * x.[4] // Aa
                2.0 * x.[5] * x.[5] // aA
                2.0 * x.[6] * x.[6] // aa
            |]
            |> Array.sum

        [|

            // 0 - Y
            [|

                0.0001 * x.[2] // a | SynthesisName: Y <-> a
                -0.001 * x.[0] // Y | SynthesisName: Y <-> a
                0.0001 * x.[1] // A | SynthesisName: Y <-> A
                -0.001 * x.[0] // Y | SynthesisName: Y <-> A
            |]
            |> Array.sum

            // 1 - A
            [|

                0.0001 * x.[5] // aA | LigationName: a + A <-> aA
                -0.001 * x.[2] * x.[1] // a + A | LigationName: a + A <-> aA
                0.0001 * x.[4] // Aa | LigationName: A + a <-> Aa
                -0.001 * x.[1] * x.[2] // A + a | LigationName: A + a <-> Aa
                0.0001 * x.[3] // AA | LigationName: A + A <-> AA
                0.0001 * x.[3] // AA | LigationName: A + A <-> AA
                -0.001 * x.[1] * x.[1] // A + A | LigationName: A + A <-> AA
                -0.001 * x.[1] * x.[1] // A + A | LigationName: A + A <-> AA
                -0.0001 * x.[1] // A | SynthesisName: Y <-> A
                0.001 * x.[0] // Y | SynthesisName: Y <-> A
            |]
            |> Array.sum

            // 2 - a
            [|

                0.0001 * x.[5] // aA | LigationName: a + A <-> aA
                -0.001 * x.[2] * x.[1] // a + A | LigationName: a + A <-> aA
                0.0001 * x.[4] // Aa | LigationName: A + a <-> Aa
                -0.001 * x.[1] * x.[2] // A + a | LigationName: A + a <-> Aa
                0.0001 * x.[6] // aa | LigationName: a + a <-> aa
                0.0001 * x.[6] // aa | LigationName: a + a <-> aa
                -0.001 * x.[2] * x.[2] // a + a | LigationName: a + a <-> aa
                -0.001 * x.[2] * x.[2] // a + a | LigationName: a + a <-> aa
                -0.0001 * x.[2] // a | SynthesisName: Y <-> a
                0.001 * x.[0] // Y | SynthesisName: Y <-> a
            |]
            |> Array.sum

            // 3 - AA
            [|

                -0.0001 * x.[3] // AA | LigationName: A + A <-> AA
                0.001 * x.[1] * x.[1] // A + A | LigationName: A + A <-> AA
            |]
            |> Array.sum

            // 4 - Aa
            [|

                -0.0001 * x.[4] // Aa | LigationName: A + a <-> Aa
                0.001 * x.[1] * x.[2] // A + a | LigationName: A + a <-> Aa
            |]
            |> Array.sum

            // 5 - aA
            [|

                -0.0001 * x.[5] // aA | LigationName: a + A <-> aA
                0.001 * x.[2] * x.[1] // a + A | LigationName: a + A <-> aA
            |]
            |> Array.sum

            // 6 - aa
            [|

                -0.0001 * x.[6] // aa | LigationName: a + a <-> aa
                0.001 * x.[2] * x.[2] // a + a | LigationName: a + a <-> aa
            |]
            |> Array.sum

        |]

    let modelDataParams = 
        {
            numberOfSubstances = 7
            numberOfAminoAcids = OneAminoAcid
            maxPeptideLength = TwoMax
            getTotals = getTotals
            getTotalSubst = getTotalSubst
            allSubst = allSubst
            allInd = allInd

            allRawReactions = 
                [
                ]

            allReactions = 
                [
                    (SynthesisName, 2)
                    (LigationName, 4)
                ]
        }

I do not have an answer for you but this post may help you https://stackoverflow.com/questions/28656872/why-is-stack-size-in-c-sharp-exactly-1-mb — AMieres, Nov 28 '18 at 13:16
Maybe it would help understand the issue better if you could also share the code where you call ALGLIB and where you specify the number of LOCs. BTW, what is LOC? — AMieres, Nov 28 '18 at 13:19
Ye, LOC is lines of code. The models are huge and pre-generating them is the only way that it can work: a 5,000 x 5,000 matrix has, well, 25M coefficients, which are reduced to, let's say, about 500 with 0.01% of non-zero coefficients. Subsequently a 150K LOC model needs about 25 GB of memory to generate but under 1 GB to run. — Konstantin Konstantinov, Nov 28 '18 at 14:34
@AMieres , the only option that I could use from the reference was EDITBIN.EXE because I run the models under FSI. Unfortunately, changing fsiAnyCpu.exe made it unusable and VS could no longer load it :( — Konstantin Konstantinov, Nov 28 '18 at 14:55
What produces the stack-overflow? Is it generating the 150K LOCs? Is it the `update` function? Is it calling ALGLIB? If you are not sure you should pepper your code with `printfn`s to discover exactly where it is happening. — AMieres, Nov 28 '18 at 15:09
@AMieres The model generation works fine even though it needs a huge memory footprint. However, when I run the model it fails. Thanks for the hint about `printfn`. I will give it a try and advise. — Konstantin Konstantinov, Nov 28 '18 at 15:26
Can you confirm whether the `StackOverflowException` occurs in the ALGLIB code? — Curt Nichols, Nov 28 '18 at 18:45
@CurtNichols and @AMieres I checked and can confirm that stack overflow happens during call to`update`. What I mean is that the call "starts" but it never gets inside the function. If I delete all content of `update` and replace it to just return input `x`, then everything works. Or, if I use a smaller model (which results in about 100K LOC), then it also works. I guess, that this is the current limit of F# or, maybe, .NET. — Konstantin Konstantinov, Nov 29 '18 at 02:45

AMieres · Accepted Answer · 2018-11-29T21:51:26.863

1

To summarize our findings.

After a series of trial an errors testing, tracing and debugging different hypothesis @Konstantin was able to discover that the issue was due to the JIT compiler. Apparently it was trying to compile the update function prior to its first execution. This function was too large and that was causing the Stack Overflow.

Splitting the function into smaller ones was the solution.

Bravo Konstantin!

edited Nov 29 '18 at 21:51

answered Nov 29 '18 at 06:04

AMieres

4,944
1
14
19

Thanks for the idea. I will try this or some modifications of it. The problem is that F# does not even get into the function `update`. Stack overflow happens between a call to `update` and a first line in it (`printfn`). When stack overflow does not happen (for smaller model), then the first call takes very long time (like 10-15 seconds). However, all subsequent calls (with some different input values) are fast. My guess is that F# builds some internal maps for the function on that first run and that crushes it. I filed a ticket with developers: https://github.com/fsharp/fsharp/issues/884 – Konstantin Konstantinov Nov 29 '18 at 12:49
You may be right about that, .Net could be trying to optimize the function when it is called the first time. So maybe it is this process that overflows. The change in syntax may help the optimizer as there are less nested code. – AMieres Nov 29 '18 at 13:01
1

I googled a little bit more on the subject: when then code is run first time the CLR invokes JIT compiler to re-compile MSIL into native code. That explains the delay on the first run and that exactly where stack overflow happens. I examined decompiled F# code and there is nothing special in that `update` function. So, I guess that JIT can't compile such a big function and the only solution is to try splitting the function into a large number of smaller pieces - basically take all individual array elements out of `update`. I will try and advise. – Konstantin Konstantinov Nov 29 '18 at 14:32
I think that now I am 100% sure that it is NET JIT compiler that fails. I tweaked the generator and if I take all internal arrays out as separate functions (there might be many thousands of them), then everything works and even the first call to `update` becomes very fast. – Konstantin Konstantinov Nov 29 '18 at 18:10
Seems like you have your solution! – AMieres Nov 29 '18 at 18:13
Half of the credit is yours :) If you want to edit your answer to sum up all the ideas that we discussed (there is no need for an elaborate description, just a lines will suffice), I will mark it as an answer. – Konstantin Konstantinov Nov 29 '18 at 21:21

64 bit F# / FSI: Limitation of very large single file function

1 Answers1