I want to install slurm on localhost. I already installed slurm on similar machine, and it works fine, but on the other machine i got this:
transgen@transgen-4:~/galaxy/tools/melanoma_tools$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
transgen-4-partition* up infinite 1 drain transgen-4
transgen@transgen-4:~/galaxy/tools/melanoma_tools$ sinfo -Nel
Fri Jun 25 17:42:56 2021
NODELIST NODES PARTITION STATE CPUS S:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASON
transgen-4 1 transgen-4-partition* drained 48 1:24:2 541008 0 1 (null) Low RealMemory
transgen@transgen-4:~/galaxy/tools/melanoma_tools$ srun -n8 sleep 10
srun: Required node not available (down, drained or reserved)
srun: job 5 queued and waiting for resources
^Csrun: Job allocation 5 has been revoked
srun: Force Terminated job 5
I found the advice to do so:
sudo scontrol update NodeName=transgen-4 State=DOWN Reason=hung_completing sudo systemctl restart slurmctld slurmd sudo scontrol update NodeName=transgen-4 State=RESUME
, but it had no effect.
slurm.conf:
# slurm.conf file generated by configurator easy.html.
# Put this file on all nodes of your cluster.
# See the slurm.conf man page for more information.
#
SlurmctldHost=localhost
#
#MailProg=/bin/mail
MpiDefault=none
#MpiParams=ports=#-#
ProctrackType=proctrack/cgroup
ReturnToService=1
SlurmctldPidFile=/var/run/slurmctld.pid
#SlurmctldPort=6817
SlurmdPidFile=/var/run/slurmd.pid
#SlurmdPort=6818
SlurmdSpoolDir=/var/spool/slurmd
SlurmUser=slurm
#SlurmdUser=root
StateSaveLocation=/var/spool/slurm.state
SwitchType=switch/none
TaskPlugin=task/cgroup
#
#
# TIMERS
#KillWait=30
#MinJobAge=300
#SlurmctldTimeout=120
#SlurmdTimeout=300
#
#
# SCHEDULING
SchedulerType=sched/backfill
SelectType=select/cons_res
SelectTypeParameters=CR_Core
#
#
# LOGGING AND ACCOUNTING
AccountingStorageType=accounting_storage/none
ClusterName=cluster
#JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/linux
#SlurmctldDebug=info
#SlurmctldLogFile=
#SlurmdDebug=info
#SlurmdLogFile=
#
#
# COMPUTE NODES
NodeName=transgen-4 NodeAddr=localhost CPUs=48 Sockets=1 CoresPerSocket=24 ThreadsPerCore=2 RealMemory=541008 State=UNKNOWN
PartitionName=transgen-4-partition Nodes=transgen-4 Default=YES MaxTime=INFINITE State=UP
cgroup.conf:
###
# Slurm cgroup support configuration file.
###
CgroupAutomount=yes
CgroupMountpoint=/sys/fs/cgroup
ConstrainCores=no
ConstrainDevices=yes
ConstrainKmemSpace=no #avoid known Kernel issues
ConstrainRAMSpace=no
ConstrainSwapSpace=no
TaskAffinity=no #use task/affinity plugin instead
How can i get slurm working?
Thanks in advance.