Java- FixedThreadPool with known pool size but unknown workers

Question

So I think I sort of understand how fixed thread pools work (using the Executor.fixedThreadPool built into Java), but from what I can see, there's usually a set number of jobs you want done and you know how many to when you start the program. For example

int numWorkers = Integer.parseInt(args[0]);
int threadPoolSize = Integer.parseInt(args[1]);
ExecutorService tpes =
    Executors.newFixedThreadPool(threadPoolSize);
WorkerThread[] workers = new WorkerThread[numWorkers];
for (int i = 0; i < numWorkers; i++) {
    workers[i] = new WorkerThread(i);
    tpes.execute(workers[i]);
}

Where each workerThread does something really simple,that part is arbitrary. What I want to know is, what if you have a fixed pool size (say 8 max) but you don't know how many workers you'll need to finish the task until runtime.

The specific example is: If I have a pool size of 8 and I'm reading from standard input. As I read, I split the input into blocks of a set size. Each one of these blocks is given to a thread (along with some other information) so that they can compress it. As such, I don't know how many threads I'll need to create as I need to keep going until I reach the end of the input. I also have to somehow ensure that the data stays in the same order. If thread 2 finishes before thread 1 and just submits its work, my data will be out of order!

Would a thread pool be the wrong approach in this situation then? It seems like it'd be great (since I can't use more than 8 threads at a time).

Basically, I want to do something like this:

ExecutorService tpes = Executors.newFixedThreadPool(threadPoolSize);
BufferedInputStream inBytes = new BufferedInputStream(System.in);
byte[] buff = new byte[BLOCK_SIZE];
byte[] dict = new byte[DICT_SIZE];
WorkerThread worker;
int bytesRead = 0;

while((bytesRead = inBytes.read(buff)) != -1) {
   System.arraycopy(buff, BLOCK_SIZE-DICT_SIZE, dict, 0, DICT_SIZE);
   worker = new WorkerThread(buff, dict)   
   tpes.execute(worker);
}

This is not working code, I know, but I'm just trying to illustrate what I want.

I left out a bit, but see how buff and dict have changing values and that I don't know how long the input is. I don't think I can't actually do this thought because, well worker already exists after the first call! I can't just say worker = new WorkerThread a bunch of time since isn't it already pointing towards an existing thread (true, a thread that might be dead) and obviously in this implemenation if it did work I wouldn't be running in parallel. But my point is, I want to keep creating threads until I hit the max pool size, wait till a thread is done, then keep creating threads until I hit the end of the input.

I also need to keep stuff in order, which is the part that's really annoying.

Do you really need parallelism? You don't know how long the input is. Threads perform best when they work on different resources. So you could consider the following option: 1 thread that reads input and 1 thread that does the computation. If the end user finds out that this has unacceptable performance, then you may split the computation in multiple threads in the future. — ignis, Oct 28 '12 at 16:17
I think I understand what you mean. I think ideally that would be best, although this is something I'm trying to sort out as part of a larger school assignment. — user1777900, Oct 28 '12 at 16:39

score 1 · Answer 1 · answered Oct 28 '12 at 16:29

As @ignis points out, parallel execution may not be the best answer for your situation. However, to answer the more general question, there are several other Executor implementations to consider beyond FixedThreadPool, some of which may have the characteristics that you desire.

As far as keeping things in order, typically you would submit tasks to the executor, and for each submission, you get a Future (which is an object that promises to give you a result later, when the task finishes). So, you can keep track of the Futures in the order that you submitted tasks, and then when all tasks are done, invoke get() on each Future in order, to get the results.

score 1 · Accepted Answer · answered Oct 28 '12 at 16:37

1

Your solution is completely fine (the only point is that parallelism is perhaps not necessary if the workload of your WorkerThreads is very small).

With a thread pool, the number of submitted tasks is not relevant. There may be less or more than the number of threads in the pool, the thread pool takes care of that.

However, and this is important: You rely on some kind of order of the results of your WorkerThreads, but when using parallelism, this order is not guaranteed! It doesn't matter whether you use a thread pool, or how much worker threads you have, etc., it will always be possible that your results will be finished in an arbitrary order!

To keep the order right, give each WorkerThread the number of the current item in its constructor, and let them put their results in the right order after they are finished:

int noOfWorkItem = 0;
while((bytesRead = inBytes.read(buff)) != -1) {
   System.arraycopy(buff, BLOCK_SIZE-DICT_SIZE, dict, 0, DICT_SIZE);
   worker = new WorkerThread(buff, dict, noOfWorkItem++)   
   tpes.execute(worker);
}

answered Oct 28 '12 at 16:37

Philipp Wendler

11,184
7
52
87

Ah, thanks about the ordering! Although I'm a bit confused, with this implementation, wouldn't I only be running one thread at a time? Ideally I wanted to start up as many threads as possible until my Executor tells me the pool is full. I might be understanding it wrong, but I thought this method meant "start a thread, execute it, when it's done execute the next thread" since worker is only ever referencing one object at any given time, if you create a new WorkerThread again, don't you have to wait until the previous WorkerThread dies? – user1777900 Oct 28 '12 at 16:42
No, you are missing some understanding here. A thread pool works completely different. The `execute` method just puts some work task (your `WorkerThread` here) into a list of "todo" items. In the background, all the worker threads of the pool (you have 8 here) keep looking in this list, and as soon as they find something in there, they take it out and run the work item. – Philipp Wendler Oct 28 '12 at 17:01
Oooh, ok. So basically, this part of the method is just the Executor saving an assignment information. Like, the thread doing this assignment needs so and so information. I think why I was confused was that I thought that I HAVE to call new Workerthread 8 times to fill up the thread pool first. So even though I only have one thread object (worker) calling the new WorkerThread() function, this is an assignment given to the executor and not the actual thread itself? Is that a bit closer to what's happening? I can tell I'm still not getting it completely though, but this is a big help. – user1777900 Oct 28 '12 at 17:37
Yes, this is basically true. You can read a little bit here: https://en.wikipedia.org/wiki/Thread_pool Thread pool is a general concept that works similar in all languages, the Java thread pool is nothing specific. – Philipp Wendler Oct 28 '12 at 17:59

Java- FixedThreadPool with known pool size but unknown workers

2 Answers2

Linked