6

I have a simple test.ksh that I am running with the command:

sbatch test.ksh

I keep getting "JobState=FAILED Reason=NonZeroExitCode" (using "scontrol show job")

I have already made sure of the following:

  1. slurmd and slurmctld are up and running correctly
  2. user privileges on "test.ksh" is 777.
  3. The command "srun test.ksh" (by itself, without using sbatch) succeeds without problems
  4. I tried putting in a "return 0" in the last line of "test.ksh" without luck
  5. I tried putting in a "exit 0" in the last line of "test.ksh" without luck
  6. I tried putting in "hostname" in the last line of "test.ksh" without luck
  7. I tried putting in "srun hostname" in the last line of "test.ksh" without luck
user3200387
  • 141
  • 1
  • 4

2 Answers2

6

I found out that I hadn't set --error and --output, which meant that the default was the current directory from which I was issuing the command.

The problem was that I didn't have sufficient privileges to write to the current directory.

The solution was to set the --error and --output to directories to a place where I had privileges.

user3200387
  • 141
  • 1
  • 4
0

In my case it was because my folder owner was root when I was actually using a second user. I made the mistake to create the folder as root in the home folder of a particular user. use chown user:usergroup foldername and it fixes the problem

stats con chris
  • 178
  • 1
  • 10