0

I'm working on a multithreaded application which needs to perform some optical character recognition. The requirement of the app is that it must work really, really fast.

At one time, I have to simultaneously read 6 different words. So what I'm doing is, starting 6 threads, one thread dedicated to reading each word.

However, I'm wondering if I should go even further, and start one thread for each character within a word. So e.g, if I have 6 words and each word has about 5-6 characters, this would mean 30-36 threads (possibly upto 50-70 threads for longer words).

To process each individual character, it seems to take between 10-30 milliseconds, for a total of 200-300 milliseconds per word. (I need to bring it down to 100 milliseconds or less per word).

Which strategy will actually give me better performance? One thread per word, or one thread per character?

Ali
  • 261,656
  • 265
  • 575
  • 769
  • 2
    Be careful with context switching. – Sotirios Delimanolis Oct 07 '13 at 13:32
  • 5
    ahhhhh! I very much expect it will get slower and slower the more threads you have, especially when noOfThreads>noOfProcessors – Richard Tingle Oct 07 '13 at 13:32
  • `To process each individual character, it seems to take between 10-30 milliseconds` that means that you definitely should **not** go for one thread per character. Way too much context switching... You could probably use **OpenCL** or **CUDA** or anz other GPGPU technology - that seems to be suited for such workloads... – ppeterka Oct 07 '13 at 13:34
  • Yes. Try to keep it to about the same number of threads as your processor will support (cores*hyperthreading?2:1) – AppX Oct 07 '13 at 13:35
  • "100 ms per word" - Is that some well-known set of words or just any word? – Fildor Oct 07 '13 at 13:38
  • @RichardTingle I know.. but the question is, how many threads will cause it to run at optimal speed before it starts to slow down.. – Ali Oct 07 '13 at 14:30
  • @ppeterka66 Can you tell me a little bit about CUDA? Is it just a plugin for any existing computer / laptop or is it a new computer altogether? Can I use the same java program that I have on it? – Ali Oct 07 '13 at 14:31
  • That will depend entirely on your system, the only thing we can say for certain is that noOfThreads>noOfProcessors is bad. Do you have an 18 core processor (*2 for hyper threading), assuming not; 36 threads will give you a bad day – Richard Tingle Oct 07 '13 at 14:32
  • @AppX [Efficiency of HT is very dependent on the algorithm](http://superuser.com/questions/279629/how-much-speedup-does-a-hyper-thread-give-in-theory) so some testing will be required whether multiplicator 1, 2 or maybe some figure between these two yields the best results. – fvu Oct 07 '13 at 14:34
  • http://arashmd.blogspot.com/2013/06/java-threading.html#mgrmem –  Oct 07 '13 at 14:38
  • @ClickUpvote [CUDA](http://en.wikipedia.org/wiki/CUDA) is a nVidia related computing platform for GPUs. Plug in an nVidia card, and off you go. [OpenCL](http://en.wikipedia.org/wiki/OpenCL) (andthe [jOCL binding](http://www.jocl.org/) for Java) seems to be a bit better: it is not tied to manufacturer, the code even runs on CPUs, not only GPUs. – ppeterka Oct 07 '13 at 14:42
  • @ppeterka66 What's the main advantage of CUDA, just that it can run more threads at a time, or will it also run them faster? From reading about it, it seems that it can run more threads but it will run them slower, so I'm not sure if it would offer an overall speed advantage or not.. – Ali Oct 07 '13 at 14:53
  • @ClickUpvote there is no definitive answer to this. Yes, the individual threads might be slower, but how many threads do you have? it is device dependent, and can be as high as several hundreds... You should see if the OCR algorithm itself is able to benefit from this change (my guess is that it should). – ppeterka Oct 07 '13 at 14:57
  • @fvu Disregarding any other limiting factor 1 thread per core thread is about how many threads you want to run when trying to get the most of your application. – AppX Oct 07 '13 at 20:09
  • @AppX check Konrad Rudolph's answer, whether the hyperthreaded cores actually cause a decent speedup depends on the algorithm, with 1 * physical cores you're (obviously) safe in all circumstances, but it's useful to experiment with various values up to 2 * cores. – fvu Oct 07 '13 at 20:52
  • _"it must work really, really fast"_, this basically rules out Java. Like others mentioned above, if the algorithm _lends itself_ to be run on a GPU that is probably the way to go. If not, I'd look at optimizing the algorithms involved and\or seriously consider switching to C\C++. – pauluss86 Dec 21 '13 at 23:11
  • @pauluss86 actually, I was able to make this project work really fast in java. It was able to read blocks of text in about 10 milliseconds for 10-12 characters per block. – Ali Dec 22 '13 at 06:09
  • @ClickUpvote Nice! Obviously, 'really, really fast' should always be interpreted as 'fast enough' :) – pauluss86 Dec 24 '13 at 15:41
  • @pauluss86 10 milliseconds is 'really really fast' for optical char recognition. – Ali Dec 25 '13 at 09:23

1 Answers1

6

Which strategy will actually give me better performance? One thread per word, or one thread per character?

The answer is highly dependent on your hardware architecture and the actually processing being done. Is your processing completely CPU bound or is there any logging or other IO? The best way to answer this is to do performance runs trying various different thread settings with a number of trials to see which one does better. To get the most accurate results, your test runs should be for much longer than a couple of seconds to rule out JIT and other Java initialization.

Couple other thoughts:

  • As mentioned by @Sotirios and others, just throwing more threads at the problem may actually get it to run slower because of context switching overhead.

  • You should use an ExecutorService thread-pool so you are not forking and reaping threads every time. Thread startup/shutdown is not an insignificant process.

Gray
  • 115,027
  • 24
  • 293
  • 354
  • I'm running a Core i7 with 8 gb ram. The processing is completely CPU bound, just various comparisons to compare the images of each character, and updating the UI to show the current value (with a label.setText() ) – Ali Oct 07 '13 at 13:46
  • I'd start a fixed size thread-pool with double the threads as you have virtual processors (does your i7 have 8?). But performance testing will show you the optimal number of threads for the job. Best of luck @ClickUpvote. – Gray Oct 07 '13 at 13:48
  • I haven't checked my CPU specifically, but i7 is supposed to be able to handle 8 threads, so I'd say yes. – Ali Oct 07 '13 at 13:58
  • I seem to be getting optimal performance at 26 threads. Any more and it starts to slow down. – Ali Oct 07 '13 at 21:56
  • 1
    Sounds about right. I usually use 2x the num-threads but it depends entirely on the task at hand. Best of luck @ClickUpvote. – Gray Oct 07 '13 at 22:13