14

I'm extremely dissatisfied after translating a program from Python to Julia:

  • for small/very small inputs, Python is faster
  • for medium inputs, Julia is faster (but not that much)
  • for big inputs, Python is faster

I think the reason is that I don't understand how memory allocation works (autodidact here, no CS background). I would post my code here but it is too long and too specific and it would not be beneficial for anybody but me. Therefore I made some experiments and now I have some questions.

Consider this simple script.jl:

function main()
    @time begin
        a = [1,2,3]
    end
end
main()

When I run it I get:

$ julia script.jl
  0.000004 seconds (1 allocation: 96 bytes)

1. Why 96 bytes? When I set a = [] I get 64 bytes (why does an empty array weight so much?). 96 bytes - 64 bytes = 32 bytes. But a is an Array{Int64,1}. 3 * 64 bits = 3 * 8 bytes = 24 bytes != 32 bytes.

2. Why do I get 96 bytes even if I set a = [1,2,3,4]?

3. Why do I get 937.500 KB when I run this:

function main()
    @time begin
        for _ in 1:10000
            a = [1,2,3]
        end
    end
end
main()

and not 960.000 KB?

4. Why is, for instance, filter() so inefficient? Take a look at this:

check(n::Int64) = n % 2 == 0

function main()
    @time begin
        for _ in 1:1000
            a = [1,2,3]
            b = []
            for x in a
                check(x) && push!(b,x)
            end
            a = b
        end
    end
end
main()
$ julia script.jl
  0.000177 seconds (3.00 k allocations: 203.125 KB)

instead:

check(n::Int64) = n % 2 == 0

function main()
    @time begin
        for _ in 1:1000
            a = [1,2,3]
            a = filter(check,a)
        end
    end
end
main()

$ julia script.jl
  0.002029 seconds (3.43 k allocations: 225.339 KB)

And if I use an anonymous function (x -> x % 2 == 0)instead of check inside filter, I get:

$ julia script.jl
  0.004057 seconds (3.05 k allocations: 206.555 KB)

Why should I use a built-in function if it is slower and needs more memory?

Pigna
  • 2,792
  • 5
  • 29
  • 51
  • @πάντα ῥεῖ why did you take off the c++ tag? I put it considering that julia is written in C++, therefore these allocation memory issues may be the same in C++ – Pigna Feb 29 '16 at 17:23
  • That a language was written in a specific other one doesn't justify to put the language tag. That's for questions where you ask for c++ coding problems in particular (read the tag info). You are asking about julia not c++ (you didn't even mention it at any point in your question). – πάντα ῥεῖ Feb 29 '16 at 17:26
  • If the larger inputs are your concern when comparing Python/Julia (usually the case when scaling a solution) then presenting the actual problem is better. Both the Julia and the Python code can almost always be made faster with a little think-over! – Dan Getz Feb 29 '16 at 19:34
  • but it's 120 lines in python and more than 250 in Julia :/ @user3580870 – Pigna Feb 29 '16 at 19:38
  • But the time-consuming bit might be presented in less lines. Even without showing executable code, the "critical loop" is often very telling of the lowest hanging optimization fruit. – Dan Getz Feb 29 '16 at 19:40
  • lol, you mean critical loopS :) there are like 5-6 different nested loops that are the core of the algorithm (it's about game theory) – Pigna Feb 29 '16 at 19:42
  • Does AlphaGo have something to worry about? ;) – Dan Getz Feb 29 '16 at 19:43
  • Ahah no I don't think so. I mean, it could have some applications in board games, but definitely not in Go. – Pigna Feb 29 '16 at 19:46

2 Answers2

17

Quick answers:

1. Arrays keep track of their dimensionality and size, among other things, in a header.

2. Julia ensures its arrays are 16-byte aligned. The pattern becomes obvious if you look at the allocations for a few more examples:

julia> [@allocated(Array{Int64}(i)) for i=0:8]'
1x9 Array{Any,2}:
 64  80  80  96  96  112  112  128  128

3. It's reporting in kilobytes. There are 1024 bytes in a kilobyte:

julia> 937.500 * 1024
960000.0

4. Anonymous functions and passing functions to higher order functions like filter are known performance gotchas in 0.4, and have been fixed in the latest development version.

In general, getting more allocations than you expect is often a sign of type-instability. I highly recommend reading through the manual's Performance Tips page for more information about this.

mbauman
  • 30,958
  • 4
  • 88
  • 123
  • Thanks! About number 4, I inverted the execution times. Now I edited it right. Still, even without using an anonymous function, not using filter is faster. – Pigna Feb 29 '16 at 19:26
  • 1
    Yes, more generally, passing any functions to higher order functions like `filter` is a performance trap in 0.4. – mbauman Feb 29 '16 at 19:28
  • Why does that happen? Because not-compound functions have no performance trade-off. For instance, to use `check(x)` or `x % 2 == 0` has identical performance in the first case – Pigna Feb 29 '16 at 19:36
  • 3
    Try running `main()` twice, `main(); main()`. The first time `check` is called it is compiled, and the test is not long enough to make this negligible. – Dan Getz Feb 29 '16 at 19:38
  • 3
    Julia is fast because it's able to generate specialized code for different argument *types*. When you write a normal function call like `f() = sum((2,3))`, Julia is able to see that `sum` is a *constant*, and so it doesn't need to dynamically look up the function at run-time (look at `@code_llvm f()` — it can even pre-compute the result!). But when you pass the function as an argument (`g(fn) = fn((2,3))`), it doesn't know what function you're passing… it could be `sum` or `prod` or any `Function`. Even worse: Julia can't reason about what type it might return, so it's a type instability, too. – mbauman Feb 29 '16 at 21:05
  • In 0.5, each function gets its own type — this means that Julia can now specialize code for different functions, making it just as fast as if you wrote it out yourself. – mbauman Feb 29 '16 at 21:06
13

It's hard to figure out why your code is slow without knowing anything about it, but if you're willing, you could post it to julia-users – lots of people there (myself included) are happy to help with some performance analysis and tweaking. Julia has a pretty simple performance model once you get the hang of it, but it does take a little time to get it. Once you do, it's generally possible to get C-like performance. Here are some answers to your specific questions.

  1. Why 96 bytes? Why does an empty array weight so much?

  2. Why do I get 96 bytes even if I set a = [1,2,3,4]?

In dynamic languages, arrays are runtime objects and metadata about them takes some space. You need to store the type tag of the object, the number and size of dimensions, and flags for memory management. This is pretty standard in dynamic languages – IIRC, in PHP each array has ~400 bytes of overhead, but a PHP "array" is really much more than that. Python and Ruby are probably pretty similar to Julia in terms of array object overhead.

Furthermore, 1-dimensional arrays in Julia are dynamically resizeable via push! and pop! and other similar operations, and are somewhat overallocated to make those operations more efficient. When you grow a vector by pushing elements to it individually, you periodically will need more memory. In order to make this efficient, Julia pre-allocates extra space. As a result, one-element and two-element arrays have the same storage size; so do three-element and four-element arrays. For even moderately large arrays, this overhead is negligible. If you need to store a lot of small arrays, of course, it could become a problem. There are various ways to work around this issue, but it doesn't seem to really be your problem.

  1. Why do I get 937.500 KB

1 KB = 1024 bytes, so 937.5 KB * 1024 bytes/KB = 960000 bytes.

  1. Why is, for instance, filter() so inefficient?

If you use the development version of Julia, this is efficient. This required a massive overhaul of how functions are implemented and how they interact with the type system, which was done by Jeff Bezanson. Here's the performance now:

julia> check(n) = n % 2 == 0
check (generic function with 1 method)

julia> function f1()
           for _ in 1:1000
               a = [1,2,3]
               b = []
               for x in a
                   check(x) && push!(b,x)
               end
               a = b
           end
       end
f1 (generic function with 1 method)

julia> function f2()
           for _ in 1:1000
               a = [1,2,3]
               a = filter(x -> x % 2 == 0, a)
           end
       end
f2 (generic function with 1 method)

julia> @time f1() # compilation
  0.013673 seconds (16.86 k allocations: 833.856 KB)

julia> @time f1()
  0.000159 seconds (3.00 k allocations: 203.281 KB)

julia> @time f2() # compilation
  0.012211 seconds (7.79 k allocations: 449.308 KB)

julia> @time f2()
  0.000159 seconds (3.00 k allocations: 203.281 KB)

The performance is now indistinguishable. That is only available on a recent version of Julia master, not the 0.4 stable release, however, so if you're using a stable release, for maximal performance, you need to write out the filter operation yourself.

Community
  • 1
  • 1
StefanKarpinski
  • 32,404
  • 10
  • 86
  • 111
  • Thanks @StefanKarpinski, I'll post it on julia-users then. But maybe first I'll try the development version. Just one thing: where should I download it from? Because I don't think it's nightly builds from http://julialang.org/downloads/, right? – Pigna Feb 29 '16 at 21:46
  • The nightly builds have been having some issues lately (compile time is too long), so you'd have to build from source. – StefanKarpinski Feb 29 '16 at 22:35
  • Should I uninstall julia 0.4 and `git clone git://github.com/JuliaLang/julia.git `? – Pigna Feb 29 '16 at 22:44
  • 1
    You don't have to uninstall. You can have multiple different Julia builds in different directories. But yes, clone that and do make. It might require giving an architecture specification because OpenBLAS is bad at figuring that out. – StefanKarpinski Mar 01 '16 at 02:44
  • Why do the allocations drop (in both cases) after calling the function once? – jjjjjj Sep 10 '18 at 17:37