1

So I have this piece of code I want to optimize. The idea of the program is to iterate a list of hosts and then create one thread per host and connect to it via SSH to validate some things and then save the results in an object.

Right now the code takes a long time (20 minutes) to run through 8000 hosts. I want to optimize this code to fully maximize the utilization of CPU, cores and memory and hopefully finish faster.

ConcurrentLinkedQueue<Host> hosts= new ConcurrentLinkedQueue<Host>();

List<FutureTask<Host>> taskList = new ArrayList<FutureTask<Host>>();
final int cpus = Runtime.getRuntime().availableProcessors();
int threadCount = 1000;
if(cpus <= 2){
    threadCount = cpus * 4;
}

ExecutorService executor = Executors.newFixedThreadPool(threadCount);
for (String resource : resources) {
    FutureTask<Host> task = new FutureTask<Host>(new Worker(resource);
    taskList.add(task);
    executor.submit(task);
}

System.out.println("Processing... Waiting for all threads to finish...");

for (int j = 0; j < taskList.size(); j++) {
    FutureTask<Host> futureTask = taskList.get(j);
    try {
        hosts.add(futureTask.get());
    } catch (InterruptedException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    } catch (ExecutionException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }
}
executor.shutdown();
Klaus
  • 41
  • 6
  • I expect that each thread is spending most of its time waiting on network I/O completion. How heavy is each thread and by that I mean what estimates as to the amount of memory and processor time is being used? Do you have any actual usage measurements? See also https://stackoverflow.com/questions/763579/how-many-threads-can-a-java-vm-support and see https://stackoverflow.com/questions/7726871/maximum-number-of-threads-in-a-jvm – Richard Chambers Nov 07 '17 at 18:50
  • 1
    How many CPUs does your test machine have? The results might be significantly different between 8 threads and 1000 threads, both of which seem to be possibilities in your code. – Warren Dew Nov 07 '17 at 18:53
  • @RichardChambers my threads are CPU bound most of them only perform simple UNIX commands, so yeah the thread spend more time waiting on network I/O. I don't have actual measurements, any recommendation on to get those metrics? – Klaus Nov 07 '17 at 19:03
  • @WarrenDew my machine has 4 CPUs – Klaus Nov 07 '17 at 19:03
  • See https://stackoverflow.com/questions/9550174/measure-java-program-performance with discussion of measuring java program performance. However if your Java threads are mostly executing UNIX commands then the real work is being done by the UNIX commands and not the threads. However are the UNIX commands being done on the remote host that you are connected to? It may be that you could just spin up a couple thousand threads if most of the work is being done via UNIX commands on a remote host. The other question is are there any resources shared between the threads such as a log file? – Richard Chambers Nov 07 '17 at 19:10
  • 1
    Is the used ssh library capable of performing concurrent connections? – Robert Nov 07 '17 at 19:12
  • cool I will try Jprofiler @RichardChambers. Yes the UNIX commands are being done on the remote hosts. No there is no shared data between the threads I'm only pushing the results into a concurrent queue, I pass the threads the information they need via the constructor (I only put here one parameter for brevity) – Klaus Nov 07 '17 at 19:15
  • @Robert yes I think, I'm using SSHJ – Klaus Nov 07 '17 at 19:18
  • 1
    If it takes 20 minutes for 8000 hosts and you are running 1000 threads then as a rough guess is that less than 3 minutes per SSH session to run? What if you log how much wall clock time each session is taking? If these are light weight threads without much going on with them spending most of their time executing UNIX commands on a remote host then it would be interesting to see what happens if you jump from 1000 threads to 4000 threads. One question is how many threads are actually spun up and running concurrently. And how long does each SSH session last? – Richard Chambers Nov 07 '17 at 20:47
  • @RichardChambers on average one thread takes 30 seconds (complete SSH session). In theory there are 1000 in the threadpool but I've seen JProfiler record only from 6 to 20 threads running concurrently. I did a takes with 2000 hosts and 2000 it is pretty slow... more than 20 minutes. – Klaus Nov 10 '17 at 21:10
  • I am a bit confused by your comment of "I did a takes with 2000 host and 2000 it is pretty slow... more than 20 minutes." I must be missing something here because I read in your original posted question that 8,000 hosts take 20 minutes and now you are saying 2,000 hosts takes more than 20 minutes. I think that until you sit down and rewrite your question with more details about what is actually happening I can't help you. Perhaps someone else can fill in the gaps. – Richard Chambers Nov 11 '17 at 05:09

0 Answers0