2

I have a user account on a super computer where jobs are handled with slurm.

I would like to know the total amount of CPU hours that I have consumed on this super computer. I think that's an understandable question, because there is only a limited number of CPU hours available per project. I'm surprised that an answer is not easy to find.

I know that there are all these commands like sacct, sreport, sshare, etc... but it seems that there is no simple command that displays the used CPU hours.

Can someone help me out?

thyme
  • 388
  • 5
  • 18
  • 1
    The first answer -> https://stackoverflow.com/questions/24020420/find-out-the-cpu-time-and-memory-usage-of-a-slurm-job. For CPU time and memory, CPUTime and MaxRSS are probably what you're looking for. cputimeraw can also be used if you want the number in seconds, as opposed to the usual Slurm time format. – Odyssee Feb 02 '19 at 10:33
  • Thank you for the quick respond. I know this command, but it just gives me a list, where CPUTime and MaxRSS are listed per running job. But I am interested in the TOTAL amount of used CPU hours for all jobs I have ever submitted. – thyme Feb 02 '19 at 10:47
  • 2
    With `sacct` you get the list of seconds, and with a simple `awk` script (or any other language) you can add up all the seconds used to a grand total. There's no SLURM command to do your query directly. Maybe the supercomputer's operators have a tool to extract this data, in that case, ask them. Otherwise, you have to compute it yourself by querying the SLURM DB with `sacct`. – Poshi Feb 02 '19 at 13:06

2 Answers2

2

As others have commented, sacct should give you that information. You will need to look at the man page to get information for past jobs. You can specify a --starttime and --endtime to restrict your query to match your allocation as it ends/renews. The -l options should get you more information than you need so you can get a smaller set of options by specifying what you need with --format.

In your instance, the correct answer is to ask the administrators. You have been given an allocation of time to draw from. They likely have a system that will show you your balance and you can reconcile your balance against the output of sacct. Also, if the system you are using has different node types such as high memory, GPU, MIC, or old, they will likely charge you differently for those resources.

chuck
  • 735
  • 3
  • 4
2

You can get an overview of the used CPU hours with the following:

sacct -SYYYY-mm-dd -u username -ojobid,start,end,alloccpu,cputime | column -t

You will could calculate the total accounting units (SBU in our system) multiplying CPUTime by AllocCPU which means multiplying the total (sysem+user) CPU time by the amount of CPU used.

An example:

    JobID         NodeList         State       Start                End                  AllocCPUS   CPUTime
------------  ---------------  ----------  -------------------  -------------------  ----------  ----------
6328552       tcn[595-604]     CANCELLED+  2019-05-21T14:07:57  2019-05-23T16:48:15  240         506-17:12:00
6328552.bat+  tcn595           CANCELLED   2019-05-21T14:07:57  2019-05-23T16:48:16  24          50-16:07:36
6328552.0     tcn[595-604]     FAILED      2019-05-21T14:10:37  2019-05-23T16:48:18  240         506-06:44:00
6332520       tcn[384,386,45+  COMPLETED   2019-05-23T16:06:04  2019-05-24T00:26:36  72          25-00:38:24
6332520.bat+  tcn384           COMPLETED   2019-05-23T16:06:04  2019-05-24T00:26:36  24          8-08:12:48
6332520.0     tcn[384,386,45+  COMPLETED   2019-05-23T16:06:09  2019-05-24T00:26:33  60          20-20:24:00
6332530       tcn[37,41,44,4+  FAILED      2019-05-23T17:11:31  2019-05-25T09:13:34  240         400-08:12:00
6332530.bat+  tcn37            FAILED      2019-05-23T17:11:31  2019-05-25T09:13:34  24          40-00:49:12
6332530.0     tcn[37,41,44,4+  CANCELLED+  2019-05-23T17:11:35  2019-05-25T09:13:34  240         400-07:56:00

The fields are shown in the the manpage. They can be shown as -oOPTION (in lower case or in proper POSIX notation --format='Option,AnotherOption...' (a list is in the man).

So far so good. But there is a big caveat here:

What you see here is perfect to get an idea of what you have run or what to expect in terms of CPU / hours. But this will not necessarily reflect your real budget status, as in many cases each node / partition may have an extra parameter, the weight, which is a parameter set for accounting purposes and not part of SLURM. For instance,the GPU nodes may have a weight value of x3, which means that each GPU/hour is measured as 3 SBU instead of 1 for budgetary purposes. What I mean to say is that you can use sacct to gain insight on the CPU times but this will not necessarily reflect how much SBU credits you still have.

runlevel0
  • 2,715
  • 2
  • 24
  • 31