Is using `@code_lowered` an effective way to troubleshoot performance issues in Julia?

Question

I want to see if there are any performance issues with the Julia code I have written. I know that the @code_lowered provides a good look into how the compiler is interpreting the code, but when exactly would this @code_lowered be most useful. Are there certain sorts of performance or other issues that it does well at highlighting (if you can understand the printouts) whereas in other cases it's less useful?

I also know there are other awesome profiling tools out there in Julia, I am just trying to generally understand the use cases for each.

julia> @code_lowered Int(1.0)
CodeInfo(
1 ─ %1  = -9.223372036854776e18 <= x
└──       goto #3 if not %1
2 ─       @_3 = x < 9.223372036854776e18
└──       goto #4
3 ─       @_3 = false
4 ┄       goto #6 if not @_3
5 ─ %7  = Base.round(x, Base.RoundToZero)
│         @_4 = %7 == x
└──       goto #7
6 ─       @_4 = false
7 ┄       goto #9 if not @_4
8 ─ %12 = Base.unsafe_trunc(Int64, x)
└──       return %12
9 ─ %14 = Base.InexactError(:Int64, Int64, x)
│   %15 = Base.throw(%14)
└──       return %15
)

All introspection macros can be useful, but IMHO `@code_warntype` is the most important. It shows you type instabilities (highlighted in red). — carstenbauer, Sep 27 '19 at 16:09

Frames Catherine White · Accepted Answer · 2019-10-01T09:22:10.837

@code_lowered is not useful for understanding performance. @code_warntype is best.

Basically the different steps:

Parsing:

Source code -> Abstract Syntax Tree (like what one manipulates with macros), blocks like loops and conditionals are branches in the tree. You can see it by quoting the source code.

quote
    if x>1
       x=x-1
    end
end

Lowering:

Abstract Syntract Tree -> Lowered Intermediate Representatation (IR), which i think is best describnes as an abstract syntax list,

Basically nothing has been done to it, no optimization etc. Just the tree form has become a list where nothing has subexpressesions, instread each temporary is on its own line. Loops and ifs becomes various kinds of jumps (like labels and conditional gotos)

A few things change to lowered representation like the end in xs[end-1] becomes lastindex(xs)

This lowering can be done fully ahead of time, it doesn't need the JIT. it is just a transform of the syntax

You can access it via @code_lowered f(x) for some function f taking argument x. or if you have a Method then you can do Base.uncompressed_ast(method).

Specialization

The type information is made use of, and the optimizer runs. Things get inlined, conditions that are always true or false are removed (e.g. checks on types). The lowered IR gets turned into Typed IR.

@code_typed f(x) gets you this.
@code_typed optimize=false f(x) gets you this without the optimization step being run.
@code_warntype f(x) gets you this with extra helpful performance annotations highlighting type instabilities

This is one of the most useful layers to look at.
Probably second only to the source-code itself.

Code-generation (LLVM)

This is where we Typed IR becomes LLVM. And LLVM internally goes through like a bunch of intermediate representations. LLVM does a ton of optimization of its own. e.g. various maths intrinstics constant propagate as this stage.

Access this via @code_llvm f(x)

This is the next most useful stage after Typed. (Tying with Native Machine code, depending if you read LLVM IR better or worse than Assembly).

Code-generation (Assembly)

The LLVM becomes Assembly, human readable machine code basically. Access this via @code_native f(x)

After that there is only assumbling, to machine code. Though I suspect that the assembly is normally bypassed as one goes straght from LLVM to machine-code since they are 1-1.

See also: https://stackoverflow.com/a/43456211/179081

See also: https://stackoverflow.com/questions/43453944/what-is-the-difference-between-code-native-code-typed-and-code-llvm-in-julia — StefanKarpinski, Sep 30 '19 at 13:49