2

I want to improve the performance of my application, and found that it spends about 90% of its running time doing one of my while loops. What I basically do in this while loop is the following.

int i = 0;
while (i < 100)
  1) Search a big arrayList for position of an objects timestamp.
  2) Search the same arrayList for position of another objects timestamp.
  3) I get this subArrayList (or timewindow).
  4) The array that now is returned I iterate through and compute an average.
  5) I push this average into a stack.
  i++
endwhile

One iteration of this loop takes on average 1-10 milliseconds, and this whole part usually takes from 100-1000 milliseconds. From what I can understand, even tho the 99th sublist only takes about 1 ms to complete, it will have waited 99-9999 ms before it got its chance to do that, right, or am I way off here?

What I had in mind was spawning a thread and have it return its value on position i. When all threads where done, continue program.

I don't care if the average of timewindow x returns before average of timewindow y, only that all the threads/values have returned before I continue.

I got the following questions:

Would it be worthwhile to try and make every iteration a thread of some sort and compute it in parallel?

If so, would I need a thread pool, and what is the best way to go about doing that?

NightDog
  • 135
  • 1
  • 7
  • Edited my question to change from array to arrayList and some clarification. – NightDog Feb 22 '11 at 10:35
  • Meh, now it’s totally different. -.- :) If 1) and 2) are 2 steps, you can make them one and only iterate over the list once (and break when both found). On 3): You don’t need to “get” the subarraylist. You have found 2 indices and can just use your existing arraylist with those. On 2) If you iterate over it to find it anyway, why not start computing the average on first-find and stop on second-find? That’ll make steps 1-4 still only one iteration, with potentially early break. That way your performance will have greatly improved even without concurrency. – Kissaki Feb 22 '11 at 10:42
  • Also, I think you broke your numbering in your code? 1, 2, 3, 2, 3? – Kissaki Feb 22 '11 at 10:43
  • Fixed numbering, and sorry for editing. I thought I had it well enough prepared before I posted, but I sure did not :) – NightDog Feb 22 '11 at 10:45
  • Could you describe more preciously that "big array list" of timestamps, is it sorted? what is the approximate size of it, 100, 1000 or a million? – jnr Feb 22 '11 at 10:58
  • It depends, but could be from 10000 - 50000. – NightDog Feb 22 '11 at 10:59
  • I thank you all for helping me out on this. I guess the consensus is to optimize my code before I go for threading. So that's what I'll do. And yes, the big array can be counted on to be sorted in a descending order (Pos 0 = 1000, pos 1 = 999), so I'll look into getting myself a binary sort algorithm working on the objects. – NightDog Feb 22 '11 at 11:05

6 Answers6

4

The problem with thread is... it is a heavy object to initiate and synchronize. So one second operation might be not worth it.

Take a look at Actor model pattern. For Java you can use Akka. With actor, you can do concurrent operation lightly.

nanda
  • 24,458
  • 13
  • 71
  • 90
  • And how does Akka do it? Threads are the most lightweight option for concurrency. Processes are more expensive. And as I read it, Akka runs remote (“remote actors”)? So in a separate process or separate machine/VM even. It may be easier to implement Akka, but it’s definitely not lighter, as anything doing concurrency will have to base on processes or threads, as those are the low-level things a processor can actually execute. – Kissaki Feb 22 '11 at 10:37
  • I'm not saying that using Akka you won't use thread, but it will be encapsulated in a sense that threads may be reused and so on and so on. – nanda Feb 22 '11 at 10:59
3

I get sublist i from a big array.

Why cant you use the same array to compute the average.you know the index start and end position .Run another while loop inside the parent while to compute average.

if you go for threading , i have following question.

which part of the block you want to run parallel

what about Synchronization? , multliple thread writing to the stack.

Andreas Dolk
  • 113,398
  • 19
  • 180
  • 268
Dead Programmer
  • 12,427
  • 23
  • 80
  • 112
  • I guess I misspelled that array, it's suppouse to be arrayList, it contains objects that got a value and a timestamp. What I do is search for two timestamps and then return the apropriate window, then parse this arrayList getting the objects value and compute an avrage from that. – NightDog Feb 22 '11 at 10:30
  • @Karl you have indexOf() method in arrayList, which will provide you the index of timestamp is stored in arrayList.but you need to override equals method. – Dead Programmer Feb 22 '11 at 10:40
1

There is significant overhead for coordinating threads, and they do not help performance at all unless they allow multiple cores to get cranking, or if you can overlap computation with I/O.

Before considering a design change, why not find out what bottlenecks you have, and fix them? Here's a simple way to do that. Often you can find large speedups that way.

Community
  • 1
  • 1
Mike Dunlavey
  • 40,059
  • 14
  • 91
  • 135
0

How do you get your sublist from the big array? A simple, already efficient improvement would be to iterate through the array only once, selecting the elements and adding their to an iteratively calculated average.

Something like :

int sum = 0;
int count = 0;
for(MyObject object : myBigArray){
    if (mustBelongToSublist(object){
        count++;
        sum += object.value();
    }
}
int average = (double)sum/count;
Dunaril
  • 2,757
  • 5
  • 31
  • 53
  • I think it’s an array of lists. And he selects one such sublist from the array. That’s how I understood it. As he says `sublist i from a big array`, he explicitly indexes it. – Kissaki Feb 22 '11 at 10:31
  • The edit makes me even more perplex about the way the OP gets his sublist. – Dunaril Feb 22 '11 at 10:43
0

Depends on your goal. Should it run in less then a second? May the data grow (a lot)?

Threading is only applicable if you can efficiently create sub-tasks. For example, if your list you iterate through would be a linked list, it may not be applicable with cheap computations on each element as, for a sub-task, navigating to subparts of the list is expensive. If you have individual lists you each have to iterate over it may just be good though, as you’ll start to iterate from the beginning of the list anyway.

In your case, sure. Why not. You’ll have to decide on a strategy on what to do with your results though. Should they be placed in order in your stack? Or does it not matter? Do you want to thread and then let the thread wait until those in front finish? Or do you use a different strategy? Waiting threads is not good though, ofc. If you can create 2 or 4 threads and each can continually work, that’s when it becomes very performant.

As you iterate i from 0 to 100 and each iteration does not depend on another iteration you’re good to separate those into sub-tasks. You have 100 tasks. Those can be split amongst threads.

Don’t overkill with threads, you only have a limited amount of CPUs and only 100 tasks, so 2 or 4 threads should be enough. Make the threads and tell them to compute the content of your while for the indices e.g. 25 to 50.

Kissaki
  • 8,810
  • 5
  • 40
  • 42
0

I like answers gven by @Suesh and @nanda and would like to summarize.

First, optimize your code. I think that at least half of the time of the operation you are spending on copying of the sub array. You have to work in place: find the first and last index of the array and calculate average of the elements. The best solution requires just one run on the array. If it is impossible (I do not know what is your logic to find the indexes) in the worse case you have to iterate the array twice. But do not copy its content.

If this optimization does not help enough think about thread pool or actors model using Akka (as was suggested by nanda)

AlexR
  • 114,158
  • 16
  • 130
  • 208