Running Java code in a Sun Grid Engine Cluster

Question

I'm running 500 jobs in a Sun Engine Cluster, and I'm having some problems (the administrator of the cluster suspended my jobs because I was over spawning CPU). The code is written in Java.

When I run one of the jobs in my PC (Ubuntu 14.04), and use the htop command to see what's going on, I get this:

I've seen that those are not separate processes, but threads. The code does not generate threads, but they probably are some Java threads (like garbage collector). The first problem is: when I run the same test on the cluster, and use htop, I have much more threads/processes, around 50 (for only one job). Does anybody knows why this might be happening?

I'm using the following options with qsub:

qsub -t 1-500 -l h_rt=05:00:00 -l h_cpu=05:00:00 -l h_vmem=6G -e /some_path/ -o /some_path/ -N all_runs -cwd -m as -M mail@mail ./run.sh

(In run.sh I have all the jobs specified).

With this qsub command each job gets 1 slot, and the use of the CPU is sometimes 150 - 200% (-> 1 slot is not enough). I saw that the cluster has a parallel environment, so more slots can be assign to each job. This can be done adding -pe smp 4 (or some other number) to the qsub command.

How can you know how many slots do you need? And, -pe smp 4 will strictly limit to a max of 4 slots? I mean, when a job have 1 slot, and the use of the CPU is 200%, it can affect other users jobs. I want to be sure that that cannot happen.

If there is some important information missing please let me know and I'll add it.

Maybe try this first, to see if you can limit the thread count available to java: http://stackoverflow.com/questions/17341883/limiting-number-of-threads-used-by-the-jvm. Ideally though, you would want to know more about how the java program uses threads, so you can supply the correct number to the parallel env of SGE. — Vince, Apr 24 '15 at 13:30
While it's true that java instantiates a number of jvm threads automatically, there shouldn't be more than 5 or 6 of them. Are you using any kind of external libraries or communication? (e.g. sockets, jdbc, jms, ...) — bvdb, Apr 24 '15 at 21:20
No, I'm not using any external libraries. I ran a simple program (a loop that adds 1 to a variable in each iteration) and the same number of threads appeared... — Tomas, Apr 26 '15 at 20:22
To follow up, qsub in absence of parallel env. assume 1 core per job. So, if JVM instantiates multiple threads, which in turn will use multiple cores, this may oversubscribe the node running the job, hence admin being worried. See here for limiting JVM threds in GC: http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html#available_collectors. This assume GC using multiple threads is issue. — Vince, Apr 27 '15 at 18:08

Running Java code in a Sun Grid Engine Cluster

0 Answers0