SGE: Jobs stuck in qw state

Question

I'm trying to submit jobs to SGE. It has been working for me the same way in the past. Now instead, all jobs are stuck in the qw state.

"qstat -g c" output:

> CLUSTER QUEUE   CQLOAD   USED  AVAIL  TOTAL
> all.q           0.38      0    160   1920   
> gpu6.q          -NA-      0      0      4    
> par6.q          0.38    750    135   1800      
> seq6.q          0.41    103    170    416   
> smp3.q          1.01      0      0     96

"qstat" output looks like always.

Googling only gave me hints for people with root access which I don't have. Suggestions anyone?

Thanks.

Edit: Jobs were submitted via "qsub -q seq6.q scriptname" or alternatively smp3.q or par6.q.

"qstat -j jobid" gives nothing special as far as I can see:

job_number:                 2821318
exec_file:                  job_scripts/2821318
submission_time:            Wed Mar  4 12:07:15 2015
owner:                      username
uid:                        31519
group:                      dch
gid:                        1150
sge_o_home:                 /home/hudson/pg/username
sge_o_log_name:             username
sge_o_path:                 /gpfs/hamilton6/apps/intel_comp_2014/composer_xe_2013_sp1.2.144/bin/intel64:/usr/local/bin:/bin:/usr/bin:/usr/lpp/mmfs/bin:/usr/local/Cluster-Apps/sge/6.1u6/bin/lx24-amd64:/panfs/panasas1.hpc.dur.ac.uk/apps/nag/fll6a21dpl/scripts
sge_o_shell:                /bin/tcsh
sge_o_workdir:              /panfs/panasas1.hpc.dur.ac.uk/username/path
sge_o_host:                 hamilton1
account:                    sge
mail_list:                  username@hamilton1
notify:                     FALSE
job_name:                   scriptname
jobshare:                   0
hard_queue_list:            seq6.q
env_list:                   
script_file:                scriptname
scheduling info:            (Collecting of scheduler job information is turned off)

Agreed with Finch_Powers. Also, please edit post with qsub command and options used. It is difficult to solve this given so little information. — Vince, Mar 03 '15 at 20:31
Only thing I can think of is your priority is being downgraded to point of waiting, which makes no sense since slots are available. I would speak to your sysadmin to help you out. — Vince, Mar 05 '15 at 20:57

score 3 · Answer 1 · answered Mar 11 '15 at 08:16

I have had the same issue today. We are running Univa Grid Engine for a customer. I configured some complexes for running jobs which are requesting much memory ( h_stack=64M, memory_free=4G,virtual_free=4G) on the masterhost. After this config jobs will hang in the waiting queue. This configuration match many years with 3G on all our execution hosts. I will test this new config (4G) next days. All servers have enough memory! Ingo

SGE: Jobs stuck in qw state

1 Answers1