8

Currently we are using threads in our application(Java). But there will be some 1000 (or more) threads created at a time.This threads should process the data and store it in db.

This is consuming more memory and I/O.

What could be the best alternative for this?. Scalability,Consistency and performance are the main requirements.

dafnap
  • 15
  • 4
Anil Kumar
  • 2,521
  • 7
  • 23
  • 40
  • 6
    The basic misconception about threads is that the more the better. This isn't true in general, usually the opposite is what you want: as few threads as you can get away with, not significantly more than the number of CPUs you have. Even if your process is I/O bound and thus most of your threads are waiting for I/O, having more threads may not buy you anything, because it can slow down the systems you're doing I/O with. – biziclop Jul 07 '15 at 08:45
  • 5
    Are you using a Thread Pool already? I hope you're not creating a 1000 odd threads directly. – Ravi K Thapliyal Jul 07 '15 at 08:45
  • 1
    That said, your problem is way too general to get a good answer. Akka and Hadoop are two very different possible answers for example, which may or may not fit what you want. But it is also possible that using an appropriately sized thread pool will be enough. – biziclop Jul 07 '15 at 08:45
  • 2
    A limited thread-pool? You can't get more work done in parallel than how many CPU cores you have. If you don't have a thousand cores, there's really no real reason to create a thousand threads. Anyway, [fibers](http://www.paralleluniverse.co/quasar/) might be worthwile to look at for you. – Petr Janeček Jul 07 '15 at 08:45
  • Ah multi-threading. It promises so much, but delivers so little when you don't know what you're doing... – Kayaman Jul 07 '15 at 08:46
  • @RaviThapliyal right now we are not using Thread pools. Depending on the no of threads needed, we are creating in directly. – Anil Kumar Jul 07 '15 at 08:51
  • @AnilKumar That should definitely be the first step then. – biziclop Jul 07 '15 at 08:52
  • @AnilKumar Yes, start with a thread pool. Say you have two quad cores with HyperThreading support.. you can configure a thread pool of size 2 * 4 * 2 = 16 threads. Your server CPUs would obviously be different than desktop ones. I'm just giving you a general idea. – Ravi K Thapliyal Jul 07 '15 at 09:05
  • Be aware of this when you read all of the suggestions: Threads are commonly used for two different reasons: One is, a way to organize programs that must _wait_ for several different sources of asynchronous event. The other is, as a means to do parallel processing on a multi-core architecture. Some of the proposed alternatives will be more oriented toward the waiting-for-events use case, and some will be more oriented toward the parallel processing use-case. – Solomon Slow Jul 07 '15 at 13:41

4 Answers4

18

Have you tried thread pools? A thread pool consists of a reasonable number of threads (enough to use all processors, but not much more) and re-uses threads (again reducing overhead) to execute a large number of tasks concurrently.

Here is a small example to give you an idea

ExecutorService executor = Executors.newFixedThreadPool(5);
Runnable job = new Runnable() {
     public void run() {
        // do some work
     }
}
executor.execute(job);

If you look at the ScheduledThreadPoolExecutor, you will find a lot of features for executing and scheduling jobs.

Cephalopod
  • 14,632
  • 7
  • 51
  • 70
  • If i'm going with thread pool of size say 100, there will be 100 db connections created which as same as no of connections without thread pool. Can you please suggest any way to create a db connections and let them be used by all the threads?? – Anil Kumar Jul 08 '15 at 12:47
  • Seems to me like you should open a new question and give more details about what you are doing and why. In the mean time, for most cases your thread pool should have at most twice as many threads as your server has cores. Less if your database runs on the same server. – Cephalopod Jul 08 '15 at 12:56
14

Try to take a look at the Actor model.

The actor model is a concurrent programming model, in which the workload is distributed between entities running in parallel, called actors.

It is a model in which there is no shared state, actors are isolated and information can flow in the form of messages.

The players receive these messages and can only react manipulating the data in the message (computing or processing on data), sending a message to other players or by creating new actors.

This model is a high level abstraction over mutex-locks and threads, which removes the complexity for the developer and it was designed mainly to build highly available and competing telecom systems, by Ericsson in 1973 on Erlang.

Actors are very lightweight concurrent entities. They process messages asynchronously using an event-driven receive loop. Pattern matching against messages is a convenient way to express an actor's behavior. They raise the abstraction level and make it much easier to write, test, understand and maintain concurrent and/or distributed systems. You can focus on workflow—how the messages flow in the system—instead of low level primitives like threads, locks and socket IO.

In Java/Scala, you can find the Akka framework that is build based on this actor model.

eliasah
  • 39,588
  • 11
  • 124
  • 154
  • 1
    Actors tend to be significantly slower than pure threads. Maybe not important in many contexts. But when raw speed is critical good to keep in mind. They also tend to be a bit on the verbose and IMHO more difficult to troubleshoot. – WestCoastProjects Oct 02 '15 at 06:18
  • But how about its use in Spark, per say? If I'm not mistaken Spark uses the actor model. – eliasah Oct 02 '15 at 06:20
  • 2
    Spark is not running many many threads per jvm. By 'many' I mean hundreds or more It is at that level that the performance difference really shows. – WestCoastProjects Oct 02 '15 at 06:23
9

Use a thread pool. That way you can define a number of threads that you want to have running. Each new task is put into a queue and waits there until a thread is done with its old task and thus free to process a new task.

This is scaleable, because you can define how many threads you want to have running. You can choose few threads on a device with few processing cores to conserve memory and reduce synchronization overhead, or many threads on a device with many cores. So e.g. if you run this on a device with 4 cores and hyperthreading, choose 8 threads, if you run it on a device with 48 hardware threads then choose 48 threads.

The performance is generally better than starting a new thread for each task, since starting and killing threads does have quite some overhead. Threadpools reuse Threads and thus don't have that overhead.

It is also consistent, since there is a threadpool implementation in the Java standard library.

Dakkaron
  • 5,930
  • 2
  • 36
  • 51
  • 1
    As a side note, thread pool sizing is tricky business, and while the rule of thumb of using exactly as many threads as you can run in parallel is a good starting point, to get the maximum out of it, you should monitor your application and tweak things accordingly. For example if you're doing lots of I/O, a slightly bigger pool may be better, if you use parallel GC and the overhead is too big, leave a few cores for the GC and so on. – biziclop Jul 07 '15 at 08:55
  • @Dakkaron, thanks for the suggestion. But my doubt is if i take 48 threads at a time, then won't it take lot of time to run my 1000 threads?? – Anil Kumar Jul 07 '15 at 08:57
  • it would be nice if the pool parameters could be changed while running – Skaperen Jul 07 '15 at 08:58
  • 1
    @AnilKumar Overall it may be quicker. Imagine pouring out water from a bottle, what's better: holding it vertically or holding it at an angle? Water flows faster when you tip the bottle vertical but you lose a lot of time with bubbles going up. It's similar here. The important thing is to measure performance before and after a change because each problem is different. – biziclop Jul 07 '15 at 08:59
  • how long do you expect each thread to need to run? – Skaperen Jul 07 '15 at 08:59
  • @biziclop, great explanation... would you think MQs can do the job for me?.please provide your comments – Anil Kumar Jul 07 '15 at 09:02
  • As biziclop said: It depends on what you are doing. If the 1000 threads do a lot of CPU intensive stuff then it will probably be faster using a thread pool, since your hardware can only run a limited amount of threads at a time. If you have more software threads than hardware threads then the CPU can't run all of your threads at the same time, but instead the CPU will run as many threads as possible at a time then then switch to the inactive threads. These context switches are quite expensive. So fewer threads might speed up your tasks. – Dakkaron Jul 07 '15 at 09:20
  • 1
    @AnilKumar My advice is to start with the simplest solution, and measure if it's enough. I would first create some kind of service that you can submit tasks to and rewrite the application to use that service instead of creating threads directly. Then you can implement the service using a thread pool. Measure the performance and if you can't get enough out of it, move on to a slightly more complicated solution, which could be MQs for example. – biziclop Jul 07 '15 at 09:21
  • On the other hand, if your threads are mostly waiting (e.g. for network IO) then having more threads might speed up the process. But again, if all the threads are fighting for the same limited resource (e.g. hard drive IO) having less threads might speed up the process again. So in the end, to get the best performance, you should just try it out with a few different configurations and see what works best. – Dakkaron Jul 07 '15 at 09:22
  • TLDR: If the threads are fighting over a limited resource -> more threads won't increase the speed, rather decrease it. If they are disjunct -> more threads will increase the speed. – Dakkaron Jul 07 '15 at 09:49
6

I think you don't need an alternative to multi-threading, just a more efficient thread implementation.

Quasar adds fibers (i.e. lightweight threads) to the JVM, of which you can create even millions rather than few hundreds, so you can get the same performance of async frameworks without giving up the thread abstraction and regular imperative control flow constructs (sequence, loops etc.) available in the language.

It also unifies JVM/JDK's threads and its fibers under a common strand interface, so they can interoperate seamlessly, and provides a porting of java.util.concurrent to this unified concept. This also means your porting effort will be minimal (if any).

On top of strands (either fibers or regular threads) Quasar also offers fully-fledged Erlang-style actors (see here for a comparison with Akka), blocking Go-like channels and dataflow programming, so you can choose the concurrent programming paradigm that suits best your skills and needs without being forced into one.

It also provides bindings for popular and standard technologies (as part of the Comsat project), so you can preserve your code assets because the porting effort will be minimal (if any). For the same reason you can also opt-out easily, should you choose to.

Currently Quasar has bindings for Java 7 and 8, Clojure under the Pulsar project and JetBrains' Kotlin. Being based on JVM bytecode instrumentation, Quasar can really work with any JVM language if an integration module is present, and it offers tools to build additional ones.

Starting with Java9, instrumentation will be automatic and no integration modules will be needed anymore.

circlespainter
  • 836
  • 5
  • 8
  • how to use fibers ?? any tutorial?? – Anil Kumar Jul 07 '15 at 10:39
  • The docs of Quasar, Pulsar and Comsat linked above are pretty comprehensive and good as tutorials; they also link a Quasar Maven archetype and its corresponding Gradle template, a Comsat Maven archetype and its corresponding Gradle template, Comsat examples, a Comsat Ring Leiningen template, quasar-stocks ported from Akka/Play and a Comsat-jOOQ example. Lots of information and tutorials can be found in the [blog](http://blog.paralleluniverse.co) such as a screencast and others. The Google Quasar/Pulsar and Comsat forums are a great source as well. – circlespainter Jul 07 '15 at 10:48
  • 3
    It looks like this is a technology that your employer developed. You should probably declare your affiliation: http://meta.stackexchange.com/questions/57497/limits-for-self-promotion-in-answers#59302 – James_pic Jul 07 '15 at 15:36
  • 1
    Thanks for pointing out; yes, I am part of the Quasar/Pulsar/Comsat development team. – circlespainter Jul 08 '15 at 06:45