11

I have asked a related question before Why OCaml's threading is considered as `not enough`?

No matter how "bad" ocaml's threading is, I notice some libraries say they can do real threading.

For example, Lwt

Lwt offers a new alternative. It provides very light-weight cooperative threads; ``launching'' a thread is a very fast operation, it does not require a new stack, a new process, or anything else. Moreover context switches are very fast. In fact, it is so easy that we will launch a thread for every system call. And composing cooperative threads will allow us to write highly asynchronous programs.

Also Jane Street's aync_core also provides similar things, if I am right.


But I am quite confused. Do Lwt or aync_core provide threading like Java threading?

If I use them, can I utilise multiple cpu?

In what way, can I get a "real threading" (just like in Java) in OCaml?


Edit

I am still confused.

Let me add a scenario:

I have a server (16 cpu cores) and a server application.

What the server application does are:

  • It listens to requests
  • For each request, it starts a computational task (let's say costs 2 minutes to finish)
  • When each task finishes, the task will either return the result back to the main or just send the result back to client directly

In Java, it is very easy. I create a thread pool, then for each request, I create a thread in that pool. that thread will run the computational task. This is mature in Java and it can utilize the 16 cpu cores. Am I right?

So my question is: can I do the same thing in OCaml?

Community
  • 1
  • 1
Jackson Tale
  • 25,428
  • 34
  • 149
  • 271
  • I believe it should be possible to use the threading facilities of the JVM with the port of ocaml to that platform, but I did not try it yet. – didierc May 15 '13 at 23:38

2 Answers2

16

The example of parallelized server that you cite is one of those embarassingly parallel problem that are well solved with a simple multiprocessing model, using fork. This has been doable in OCaml for decades, and yes, you will an almost linear speedup using all the cores of your machine if you need.

To do that using the simple primitives of the standard library, see this Chapter of the online book "Unix system programming in OCaml" (first released in 2003), and/or this chapter of the online book "Developing Applications with OCaml" (first released in 2000).

You may also want to use higher-level libraries such as Gerd Stolpmann's OCamlnet library mentioned by rafix, which provides a lot of stuff from direct helper for the usual client/server design, to lower-level multiprocess communication libraries; see the documentation.

The library Parmap is also interesting, but maybe for slightly different use case (it's more that you have a large array of data available all at the same time, that you want to process with the same function in parallel): a drop-in remplacement of Array.map or List.map (or fold) that parallelizes computations.

gasche
  • 31,259
  • 3
  • 78
  • 100
  • Ok, so if we can use `fork` in this kind of scenario, why `lwt` or `async_core` exist? I mean what are the differences between them and `fork`? Are `lwt` or `async_core` multi-core enabled? – Jackson Tale May 15 '13 at 16:09
  • @rgrinberg I was confused by your answer, sorry, maybe it is because I really don't understand what is `thread` now. From my understanding from Java world, `threading` was born with the ability to utilise multi-cpu. But from your answer or from what I have read some documents so far, it is not. – Jackson Tale May 15 '13 at 16:12
  • 1
    First you need to understand the difference between concurrency and parallelism. Once you understand that you will see that `fork` and it's family of tools are used to solve the parallelism problem, while `lwt` and `async` are used to solve the concurrency problem. In java, threads are used for both problems (because the jvm allows for it). While OCaml's native threads are bad for either concurrency or parallelism. – rgrinberg May 15 '13 at 16:14
  • @rgrinberg ok, i understand more now. so when we talk about `Java Concurrency`, we are actually talking about `Java Concurrency & Parallelism`, right? – Jackson Tale May 15 '13 at 16:21
  • 2
    when we talk about java's threading then yes we could either mean that we are using them for parallelism or concurrency. Example of parallelism: splitting a big job into N tasks and having the threads work individually. Example of concurrency: having UI updates occur in their own thread so that expensive operations do not make the UI unresponsive. – rgrinberg May 15 '13 at 17:42
11

The closest thing you will find to real (preemptive) threading is the built in threading library. By that mean I mean that your programming model will be the same but with 2 important differences:

  • OCaml's native threads are not lightweight like Java's.
  • Only a single thread executes at a time, so you cannot take advantage of multiple processes.

This makes OCaml's threads a pretty bad solution to either concurrency or parallelism so in general people avoid using them. But they still do have their uses.

Lwt and Async are very similar and provide you with a different flavour of threading - a cooperative style. Cooperative threads differ from preemptive ones in the fact context switching between threads is explicit in the code and blocking calls are always apparent from the type signature. The cooperative threads provided are very cheap so very well suited for concurrency but again will not help you with parallelilsm (due to the limitations of OCaml's runtime).

See this for a good introduction to cooperative threading: http://janestreet.github.io/guide-async.html

EDIT: for your particular scenario I would use Parmap, if the tasks are so computationally intensive as in your example then the overhead of starting the processes from parmap should be negligible.

rgrinberg
  • 9,638
  • 7
  • 27
  • 44
  • So there are no ways to utilise multiple cpu? – Jackson Tale May 15 '13 at 11:48
  • @Jackson: You can use multiple processes. There are library available to make things easier: [parmap](http://rdicosmo.github.io/parmap/), [nproc](https://github.com/MyLifeLabs/nproc), [net-multicore](http://blog.camlcity.org/blog/multicore3.html) But of course, it's not the same as parallel running threads. – rafix May 15 '13 at 11:56
  • 2
    To add to rafix's comment: the only time when you are screwed is when you need fast mutable shared memory between the processes. – rgrinberg May 15 '13 at 12:00