3

I have the following coupled system of ODEs (that come from discretizing an integrodifferential PDE): enter image description here

The xi's are points on an x-grid that I control. I can solve this with the following simple piece of code:

using DifferentialEquations

function ode_syst(du,u,p, t)
    N = Int64(p[1])
    beta= p[2]
    deltax = 1/(N+1)
    xs = [deltax*i for i in 1:N]
    for j in 1:N
        du[j] = -xs[j]^(beta)*u[j]+deltax*sum([u[i]*xs[i]^(beta) for i in 1:N])
    end
end

N = 1000
u0 = ones(N)
beta = 2.0
p = [N, beta]
tspan = (0.0, 10^3);

prob = ODEProblem(ode_syst,u0,tspan,p);
sol = solve(prob);

However, as I make my grid finer, i.e. increase N, the computation time grows rapidly (I guess the scaling is quadratic in N). Is there any suggestion on how to implement this using either distributed parallelism or multithreading?

Additional information: I attach the profiling diagram that might be useful to understand where the program spends most of the timeenter image description here

dapias
  • 2,512
  • 3
  • 15
  • 23

2 Answers2

5

I've looked into your code and found a few problems such as an accidentally introduced O(N^2) behavior due to recalculating the sum term.

My improved version uses the Tullio package to get further speed up from vectorization. Tullio also has tuneable parameters which would allow for multi threading if your system becomes big enough. See here what parameters you can tune in the options section. You might also see GPU support there, i didn't test that but it might yield further speed up or break hooribly. I also choose get the length from the acutal array which should make the use more economical and less error prone.

using Tullio

function ode_syst_t(du,u,p, t)
    N = length(du)
    beta = p[1]
    deltax = 1/(N+1)
    @tullio s := deltax*(u[i]*(i*deltax)^(beta))
    @tullio du[j] = -(j*deltax)^(beta)*u[j] + s
    return nothing
end

Your code:

 @btime sol = solve(prob);
  80.592 s (1349001 allocations: 10.22 GiB)

My code:

prob2 = ODEProblem(ode_syst_t,u0,tspan,[2.0]);
@btime sol2 = solve(prob2);
  1.171 s (696 allocations: 18.50 MiB)

and the result more or less agree:

julia> sum(abs2, sol2(1000.0) .- sol(1000.0))
1.079046922815598e-14

Lutz Lehmanns solution i also benchmarked:

prob3 = ODEProblem(ode_syst_lehm,u0,tspan,p);

@btime sol3 = solve(prob3);
  1.338 s (3348 allocations: 39.38 MiB)

However as we scale N to 1000000 with a tspan of (0.0, 10.0)

prob2 = ODEProblem(ode_syst_t,u0,tspan,[2.0]);

@time solve(prob2);
  2.429239 seconds (280 allocations: 869.768 MiB, 13.60% gc time)

prob3 = ODEProblem(ode_syst_lehm,u0,tspan,p);

@time solve(prob3);
  5.961889 seconds (580 allocations: 1.967 GiB, 11.08% gc time)

My code becomes more than twice as fast due to using the 2 cores in my old and rusty machine.

  • Many, many thanks. As I said in reply to @LutzLehmann, I had a wrong intuition about setting the condition for every `u`. My impression now is that from what we have here to parallelize, there is a tiny gap. – dapias Sep 30 '21 at 09:26
3

Analyze the formula. Obviously, the atomic terms repeat. So they should only be computed once.

function ode_syst(du,u,p, t)
    N = Int64(p[1])
    beta= p[2]
    deltax = 1/(N+1)
    xs = [deltax*i for i in 1:N]
    term = [ xs[i]^(beta)*u[i] for i in 1:N]
    term_sum = deltax*sum(term)
    for j in 1:N
        du[j] = -term[j]+term_sum
    end
end

This should only linearly increase in N.

Lutz Lehmann
  • 25,219
  • 2
  • 22
  • 51
  • Many thanks. I thought it was not possible to write the function in this way because somehow Julia will not realize that the variable `u` is on both sides of the equation. I realized that my intuition was wrong – dapias Sep 30 '21 at 09:20