1

In Julia, I am trying out different parallelization libraries, to make my program more performant, and to check if memory consumption is the same as with no parallelization. The unfortunate effect of this is a lot of duplication.

Is there a way to organize my code so that I write the algorithm only once and then some macro with a parameter decides how the code is parallelized? My question is similar to this one. For example, my MWE

using ThreadsX, Folds, FLoops, Polyester
create_data = (n,s) -> [rand(1:n,r) for j=1:n for r∈[rand(1:s)]]

function F!(method ::Int, L ::Vector{Vector{Int}}) ::Nothing
    n = length(L)
    if method==0                  for j=1:n sort!(L[j]) end end
    if method==1 Threads.@threads for j=1:n sort!(L[j]) end end
    if method==2 ThreadsX.foreach(1:n) do j sort!(L[j]) end end
    if method==3 Folds.foreach(1:n)    do j sort!(L[j]) end end
    if method==4 FLoops.@floop    for j=1:n sort!(L[j]) end end
    if method==5 Polyester.@batch for j=1:n sort!(L[j]) end end 
    return nothing end
for mtd=0:5
    L = create_data(10^6,10^3);   
    @time F!(mtd,L) end

returns

 17.967120 seconds
  4.537954 seconds (38 allocations: 3.219 KiB)
  4.418978 seconds (353 allocations: 27.875 KiB)
  5.583201 seconds (54 allocations: 3.875 KiB)
  5.542852 seconds (53 allocations: 3.844 KiB)
  4.263488 seconds (3 allocations: 80 bytes)

so there are different performances already for a very simple problem. In my actual case, instead of sort!(L[j]) I have lots of intensive code with several Arrays, Vector{Vector}s, Dicts, ..., where different threads read from occasionally the same place, but write to different places, allocate space in memory, mutate the input, etc. Is there a way to create a new macro @Parallel so that my code would be just

function F!(method ::Int, L ::Vector{Vector{Int}}) ::Nothing
    n = length(L)
    @Parallel(method) for j=1:n sort!(L[j]) end
    return nothing end

Note that I have never created a macro, I only used them thus far, so some explanation would be welcome.

Leo
  • 772
  • 1
  • 6
  • 15

1 Answers1

0

A macro-based solution is possible, but seems unnecessary to me. I'd rather organize code like this:

function testfun!(x::Vector{Int})
    # here goes the repetitive part
    return sort!(x)
end

# entry point for dispatch, and set up data
function F(kernel!, M, N, strategy::Symbol)
    L = create_data(M, N)
    return F!(kernel!, L, Val{strategy}())
end

# and a function to run the loop for every strategy
function F!(kernel!, L, ::Val{:baseline})
    n = length(L)
    for j=1:n kernel!(L[j]) end
end
function F!(kernel!, L, ::Val{:Threads})
    n = length(L)
    Threads.@threads for j=1:n kernel!(L[j]) end
end
function F!(kernel!, L, ::Val{:ThreadsX})
    n = length(L)
    ThreadsX.foreach(1:n) do j kernel!(L[j]) end 
end
# ...
# + rest of the loop functions for Floops, Folds, etc.

function dotests()
    for strategy = (:baseline, :Threads, :ThreadsX, ...)
        @benchmark F(testfun!, 10^6, 10^3, strategy) 
    end
end

This is showing a dispatch-based approach. You could equally well use dictionaries or conditions. The important point is separating the "runner" F! from the "kernel" function.

phipsgabler
  • 20,535
  • 4
  • 40
  • 60
  • In your first `F`, you probably wanted to insert `M, N` instead of `10^6, 10^3`. What is `j` inside `testfun!`? Isn't it an undefined reference? Is the `!` in the argument `kernel!`just notation or does it have any special role? What do you mean by `#...`? I wish to use all 6 methods, and I don't want to duplicate code for that, since I have a lot more than just `sort!(L[j])`. Could you please provide a runnable solution, so I can test it out? – Leo Jan 27 '23 at 10:13
  • There were some copy/paste errors, I hope it's clear now what I intended to convey. I thought the repetitive part is in the "kernel" (the `sort!(L[j])`) -- and this has to be written only once here. The outer loop can potentially differ for different strategies (already here, we have `for` and `do` syntax), so it has to be written individually in _some_ way. – phipsgabler Jan 27 '23 at 11:09
  • The `!` is indeed just by convention (a function modifying its argument), it has no meaning to the language. – phipsgabler Jan 27 '23 at 11:10
  • Thank you for your response and clarifications. I didn't know `Val` can be used for this. Your solution requires 6 new functions for my original `F!`. In my program, I have 50+ functions where I use multithreading, so I don't wish to create 300 new functions just for this purpose. Thus I am still hoping for a macro (that would be compiler friendly, i.e. not decrease performance). If `do` is the problem, then at least a macro for cases 0,1,4,5. – Leo Jan 27 '23 at 11:28
  • Well, my idea was to separate the multithreading from the "inner part". In the example you provided, this works as I wrote it (you could add a `testfun2` at the cost of one additional function). Without insight into your actual code I can't really say anything better. – phipsgabler Jan 27 '23 at 11:30
  • In my lengthy code, there are a few inner loops, vector operations, concatenation, comprehensions, flattenings, reshuffling, slicing. I only wish to parallelize the outer loop `for j=1:n`. And I wish that my macro is usable on any such outer loop. Basically I'd like `[@empty_macro, Threads.@threads, FLoops.@floop, Polyester.@batch][mtd]`. – Leo Jan 27 '23 at 11:55
  • Btw, doesn't `function F!(mtd::Int, kernel!, L) n=length(L); if mtd==0 for j=1:n testfun!(L[j]) end end; if mtd==1 for Threads.@threads j=1:n testfun!(L[j]) end end; ... end` achieve the same thing? Isn't your solution therefore a bit of a complication? Or are there any downsides to what I've written? – Leo Jan 27 '23 at 12:01
  • Also, `create_data` should not be in `F` but rather `dotests`, since you don't want to benchmark that. And so `F` is unnecessary. – Leo Jan 27 '23 at 12:37
  • You mentioned this can be done with macros. Would you be willing to post such a solution? – Leo Jan 27 '23 at 16:47
  • 1
    Sorry, not without more context. But I suggest you ask on the Julia discourse, there you can provide more code and have a back-and-forth discussion. – phipsgabler Jan 28 '23 at 17:05