9

I've read through quite some documentation and questions but I'm still confused about this.

In the Profiling section of the documentation it's suggested to first run the target function in the REPL once, so that it's already compiled before being profiled. However, what if the script is fairly complicated and is inteded to be run in the command line, taking arguments? When the julia process finishes and I run the script the second time, is the compilation performed again? Posts like https://stackoverflow.com/a/42040763/1460448, Julia compiles the script every time? give conflicting answers. They also seem to be old while Julia is constantly evolving.

It seems to me that the second run takes exactly as much time as the first run in my experience. The startup time is quite long. How should I optimize such a program? Adding __precompile__() doesn't seem to have changed the execution time at all.

Also, what should I do when I want to profile such a program? All resources on profiling talk about doing so in the REPL.

xji
  • 7,341
  • 4
  • 40
  • 61
  • 1
    Yes, it is recompiled every time it is run from outside the REPL, unless you are able to compile to a binary (which is a bit difficult, I believe.) The recommended workflow is working in the REPL, and calling things from there. Also, if you want good performance, remember to wrap your code in a function. I don't see why taking arguments means you must run it from a command line. Turn it into a function, and call the function with the arguments as input variables. – DNF May 30 '18 at 22:06
  • @DNF Thanks. I do have everything in functions. It's just that I'm running a model that takes very long to finish (and eventually saves a trained model file or shows the results on the test set), therefore such a program is conventionally run through the command line, just as the case with e.g. C++, pypy or Tensorflow. Now that I understand Julia works this way, I'll just do everything in the REPL instead. Although doesn't it makes Julia a bit awkward for long-running tasks for which people are used to doing things in a different way? – xji May 31 '18 at 09:25
  • Personally, I don't see why it is more awkward to run the program at the Julia prompt, vs for example a bash prompt. Whether I'm using Matlab, Python or Julia, I always run things from within the repl. You can equally well save results from within Julia. – DNF May 31 '18 at 09:29
  • That's true. One problem I can think of might be when you need to display help messages to the user showing which arguments are needed to run the program and what each flag means, just as a traditional command-line program. The users can also feed in the arguments after corresponding flags. With a pure function call it will be more convoluted to do that? That's why I feel Julia is not suited to command-line usage yet and is really more tailored towards a REPL workflow at the moment. If binary executables can be compiled it would be much better, though that probably goes against the dynamism. – xji May 31 '18 at 12:56
  • Also the long startup time goes against using it as a command-line program. – xji May 31 '18 at 12:57
  • From the REPL you can display a docstring if the user types this: `?myfunction`. As for inputs, Julia functions support both positional input arguments, as well as keyword arguments, like this: `myfunction(arg1, arg2, tolerance=0.1, verbose=false)`. – DNF May 31 '18 at 13:02
  • The link to the modules manual page is now [here](https://docs.julialang.org/en/v1/manual/modules/) – BuhtanDingDing Apr 20 '22 at 19:58

2 Answers2

14

I disagree somewhat with my colleagues. There are absolutely valid scenarios where one would rely on running julia scripts. E.g. when you have a pipeline of scripts (e.g. matlab, python, etc) and you need to plug in a julia script somewhere in the middle of all that, and control the overall pipeline from a shell script. But, whatever the use case, saying "just use the REPL" isn't a proper answer to this question, and even if one couldn't come up with "valid" scenarios, it is still a question worth answering directly rather than with a workaround.

What I do agree on is that the solution to having appropriate code is to wrap everything critical that needs to be precompiled into modules, and only leave all but the most external commands at the script top-level. This is not too dissimilar to the matlab or C++ world anyway, where you're expected to write thorough functions, and only treat your script / main function as some sort of very brief, top-level entry point whose job is to simply prepare the initial environment, and then run those more specialised functions accordingly.

Here's an example of what I mean:

# in file 'myscript.jl'
push!( LOAD_PATH, "./" )
import MyPrecompiledModule
println( "Hello from the script. The arguments passed into it were $ARGS" )
MyPrecompiledModule.exportedfun()

# in file 'MyPrecompiledModule.jl' (e.g. in the same directory as myscript.jl)
__precompile__()
module MyPrecompiledModule
  export exportedfun;
  function innerfun()
    println("Hello from MyPrecompiledModule.innerfun");
  end

  function exportedfun()
    innerfun()
    print("Hello from MyPrecompiledModule.exportedfun");
  end
end

In the above scenario, the compiled version of the MyPrecompiledModule will be used in the script (and if one does not exist, one will be compiled the first time you run the script), therefore any optimisations from compiling will not be lost at the end of the script, but you still end up with a standalone julia script you can use as part of a bash shell script pipeline process, that you can also pass arguments to. The myscript.jl script then only has to pass these on to the imported module functions if necessary, and perform any other commands that you don't particularly care about them being compiled / optimised or not, such as perform benchmarks, provide script usage instructions, etc.

Tasos Papastylianou
  • 21,371
  • 2
  • 28
  • 57
  • PS. There is also the option of relying on a module's `__init__()` function, though I'd prefer to split things as above, personally. [This other answer](https://stackoverflow.com/a/49405645/4183191) of mine to a similar question may also be of interest to you. – Tasos Papastylianou May 31 '18 at 15:24
  • This seems reasonable and I hope it can accelerate the performance considerably so that julia might be used in the middle of a pipeline. However, it seems that `__precompile__()` won't necessarily compile everything, but just a part of them, from what I read? – xji May 31 '18 at 19:37
  • 1
    This is not a straightforward question, and I'm not sure of the implementation details myself, but yes, essentially it will compile what it can. However, note that Julia makes a distinction between functions and methods, the former being untyped, and the latter having specific types as arguments. Methods will almost certainly get precompiled, but functions presumably will only be compiled "just-in-time" for specific arguments used during your session; I do not know if these are then saved in some way or not. Also note the part about `__init__()` in the manual, this is never precompiled. – Tasos Papastylianou May 31 '18 at 19:46
  • 1
    @xji I agree with everything here except for the assertion that Tasos and I disagree :-) I probably came across a bit strong in my answer and definitely agree that there are times where you might want to call Julia from the command line (I think I slightly misjudged my audience). If we're going to talk seriously about precompilation, you definitely want to read [this awesome answer](https://stackoverflow.com/questions/40116045/why-is-julia-taking-a-long-time-on-the-first-call-into-my-module?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa). – Colin T Bowers May 31 '18 at 22:58
  • I disagree that it is not a proper answer to suggest using the REPL. The question was about re-compilation, which is solved by using the REPL. Secondly, there were some apparent misconceptions about what you could do from the REPL, relating to docstrings and input arguments. I didn't say there are no valid scenarios for running scripts, it just seemed that in this particular case, suggesting the REPL was a good idea, and not a 'workaround'. – DNF Jun 01 '18 at 07:25
  • @DNF I understand the usage of docstrings and input arguments. But they exist for other languages as well. Still this doesn't make directly using REPL suitable for some scenarios e.g. using it in the middle of a pipeline, which you might want to do when you're transforming data for example. I do get that REPL is the current "default" way of using Julia in a sense though. – xji Jun 06 '18 at 20:38
  • I tried this approach with the library [Argparse.jl](https://github.com/carlobaldassi/ArgParse.jl) but unfortunately the performance is still unsatisfactory. A test function took 2 times as long to finish (18 seconds) compared with when it's being executed in the REPL after first loading the module (9s). Even without using this external library it still took 16s. Guess there's no perfect solution until binaries can be produced and I would suggest in the project README to prefer the REPL usage for now. – xji Jun 06 '18 at 21:08
  • @xji that sounds odd. It shouldn't unless you're doing something else that requires compilation in your session. Also, with benchmarks it's always prudent to perform multiple tests and obtain an average (I assume you're already avoiding top-level variables and functions based on the previous discussion). In any case, whether you import a precompiled module from a script or from a REPL session should be immaterial; I don't think precompilation is your problem here. – Tasos Papastylianou Jun 07 '18 at 07:25
  • I'm getting `ERROR: LoadError: ArgumentError: Package MyPrecompiledModule not found in current path` – Slaus Jan 04 '22 at 12:01
  • 1
    @Slaus I can confirm that the althought the above code is quite old now (from julia v0.5), it still works in the latest julia (v1.7.1). If you're getting this error, then it's either because `MyPrecompiledModule.jl` is not in the same directory as in my example, or you're not running the example from its own directory. Note that in `myscript.jl` I did `push!( LOAD_PATH, "./" )` under the assumption that this is where you'd find the module. If you saved the module somewhere else, you need to make the load path know where that somewhere else is so that it can load modules from that location. – Tasos Papastylianou Jan 06 '22 at 07:01
  • @TasosPapastylianou thank you for your response! I'm just 2 days into Julia and I tried different ways of loading code from other files (which originally I intended to force Julia compiler **first compile** the code and only **then execute**). I can't remember how exactly I came to my current solution, but translating to provided example it ended up being: `push!( LOAD_PATH, "./" ); include("./MyPrecompiledModule.jl"); using .MyPrecompiledModule; include("./another_module.jl"); using .another_module;` – Slaus Jan 06 '22 at 10:33
  • I'm not sure what you mean (or if this comment is a question in itself). If yes, I think it would be best (and more useful to other users and yourself) to ask a new question on it, so you can display code properly and get a nice answer on it. However, in general, I think if you include a file in this way, you will cause compilation of the 'included' module every time. This is not the same as 'using' a module defined in a file, let alone a precompiled one. – Tasos Papastylianou Jan 08 '22 at 17:17
5

Please correct me if I am wrong, but it sounds like you have written some long script, say, myfile.jl, and then from your OS command line you are calling julia myfile.jl args.... Is this correct? Also, it sounds like myfile.jl does not define much in the way of functions, but is instead just a sequence of commands. Is this correct? If so, then as has been suggested in the comments on the question, this is not the typical work-flow for julia, for two reasons:

1) Calling julia from the command line, ie julia myfile.jl args... is equivalent to opening a REPL, running an include command on myfile.jl, and then closing the REPL. The initial call to include will compile any methods that are needed for the operations in myfile.jl, which takes time. But since you're running from the command line, once the include is finished, the REPL automatically closes, and all that compiled code is thrown away. This is what DNF means when he says the recommended workflow is to work within a single REPL session, and don't close it until you are done for the day, or unless you deliberately want to recompile all the methods you are using.

2) Even if you are working within a single REPL session, it is extremely important to wrap pretty much everything you do in functions (this is a very different workflow to languages like Matlab). If you do this, Julia will compile methods for each function that are specialized on the types of the input arguments that you are using. This is essentially why Julia is fast. Once a method is compiled once, it remains available for the entire REPL session, but is disposed of when you close the REPL. Critically, if you do not wrap your operations in functions, then this specialized compilation does not occur, and so you can expect very slow code. In julia, we call this "working in the global scope". Note that this feature of Julia encourages a coding style consisting of breaking your tasks down into lots of small specialized functions rather than one behemoth consisting of 1000 lines of code. This is a good idea for many reasons. (in my own codebase, many functions are a single-liners, most are 5 lines or less)

The two points above are absolutely critical to understand if you are working in Julia. However, once you are comfortable with them, I would recommend that you actually put all your functions inside modules, and then call your module(s) from an active REPL session whenever you need it. This has the additional advantage that you can just add a __precompile__() statement at the top of your module, and then julia will precompile some (but not necessarily all) of the code in that module. Once you do this, the precompiled code in your module doesn't disappear when you close the REPL, since it is stored on the hard-drive in a .ji file. So you can start a new REPL session, type using MyModule, and your precompiled code is immediately available. It will only need to re-compile if you alter the contents of the module (and this all happens automatically).

Colin T Bowers
  • 18,106
  • 8
  • 61
  • 89
  • Thanks. I do have everything in functions. It's just that I'm running a model that takes very long to finish (and eventually saves a trained model file or shows the results on the test set), therefore such a program is conventionally run through the command line, just as the case with e.g. C++, pypy or Tensorflow. Now that I understand Julia works this way, I'll just do everything in the REPL instead. Although doesn't it makes Julia a bit awkward for long-running tasks for which people are used to doing things in a different way? – xji May 31 '18 at 09:26
  • @xji As DNF says above, I don't really see the difference between working with the command line, or working within a REPL running in a command prompt. But as stated, if you put all your functions in a module and pre-compile it, then for long routines, you shouldn't notice much of a difference. By the way, just to emphasize my point above about small vs long functions, in my own codebase, many functions are single-liners, and most are 5 lines or less. – Colin T Bowers May 31 '18 at 10:36
  • Sure. I understand functional programming concepts fairly well. That's also why I really like Julia. – xji May 31 '18 at 11:55
  • 1
    @xji Sorry if I sounded a bit over the top. I'm probably over-sensitive to this since I came from a Matlab background myself, and had to spend quite a bit of time re-training myself into better habits :-) – Colin T Bowers May 31 '18 at 12:11
  • No worries. This will certainly also be helpful to others who look at this answer. – xji May 31 '18 at 12:12
  • One problem I can think of with REPL might be when you need to display help messages to the user showing which arguments are needed to run the program and what each flag means, just as a traditional command-line program. The users can also feed in the arguments after corresponding flags. With a pure function call it will be more convoluted to do that? That's why I feel Julia is not suited to command-line usage yet and is really more tailored towards a REPL workflow at the moment. If binary executables can be compiled it would be much better, though that probably goes against the dynamism. – xji May 31 '18 at 12:57
  • 1
    Also the long startup time goes against using it as a command-line program. – xji May 31 '18 at 12:57