3

I have written a little perl script that starts a program multiple times, with different parameters in a for loop. The program does a numerical calculation and uses a whole CPU if it can get one. I have several CPUs available, so ideally, I want to start as many instances of the program at once as there are available CPUs, but not more. Since there may be other processes running, the number of available CPUs is not always the same.

What I have done so far is:

#!/usr/bin/perl

use strict;
use warnings;

use IPC::Open2;
use Parallel::ForkManager;

my $program = "./program";

my($out, $in);
my $pid;

my $pm = new Parallel::ForkManager(44);

for my $x (0..100){
          my $childpid = $pm->start and next; 
          $pid= open2($out, $in, $program);

          print $in <<EOF;
          #input involving $x
EOF
          my $printstring = "";
          while(<$out>){
            if (/^\s*1\.000\s+(-\S+)D(\S+)\s*$/){
               $printstring .= "$1e$2";
            }
          }
          print $printstring, "\n";
          waitpid( $pid, 0 );
          $pm->finish;

}
$pm->wait_all_children;
print "\n\n END\n";

This obviously contains a fixed number of processes to start, and thereby a fixed number of CPUs that can be used, and I have no idea how to go about changing this to flexibly determine the available CPUs and change the number of children accordingly. Any ideas how to do this?

Update:

Just to be clear, the limiting factor here is definitely the CPU time and not I/O stuff.

I looked into loadavg, but I am confused by its output.

68.71 66.40 63.72 70/1106 19247

At the same time, top showed

Tasks: 978 total,  23 running, 955 sleeping,   0 stopped,   0 zombie
Cpu(s):  2.1%us,  1.5%sy, 93.3%ni,  3.1%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st

The number of CPUs is 48, so I would have thought that if the fourth number (in this case 70) is greater than 48, I should not start any more child processes, but according to top there seems to be some idle CPU there, although the fourth number is 70.

fifaltra
  • 305
  • 1
  • 3
  • 14
  • The right number of processes do start depends on what type of task is involved (memory-intensive? cpu-bound? i/o intensive?). Take a look a `/proc/cpuinfo` to see what you have installed. – xxfelixxx Nov 30 '15 at 07:07
  • http://search.cpan.org/~jstowe/Linux-Cpuinfo-1.10/lib/Linux/Cpuinfo.pm – xxfelixxx Nov 30 '15 at 07:07
  • @xxfelixxx one run of the program uses 100% of one CPU if you let it. – fifaltra Nov 30 '15 at 09:01
  • @xxfelixxx I know what CPUs I have, and I don't need the script to determine hardware info. I have 48 CPUs available, which is why I put 44 as the maximum number of children, as this will leave some CPUs for other people who might want to compute something. I would rather find out how many of them are free at the moment and adjust the number of children accordingly. – fifaltra Nov 30 '15 at 09:04
  • @fifaltra, check the `loadavg`. It simplest terms, it gives you how many CPUs/cores are used right now. If the number larger than the number of CPUs, the system is overloaded. – Dummy00001 Nov 30 '15 at 10:11
  • I'd ask about the nature of the thing you're trying to run - I assume it's intensive CPU workload (e.g.no IO) and therefore will run as fast as it can. Because if it's anything else, CPU time might not be your limiting factor. I'm a little bit wary of your read-regex-write loop there. I can't tell what that's doing. – Sobrique Nov 30 '15 at 10:39
  • @Sobrique Yes, it's CPU intensive, the regex just waits for the result that is printed to standard output and reformats it nicely (I shortened the script there a bit), so the CPU is definitely the limiting factor here. We're talking about 25 min per run and about 100 lines of output that have to go through the regex. – fifaltra Nov 30 '15 at 14:46
  • @Dummy00001 I looked into it, but I don't really see how this helps me, please check my update. – fifaltra Nov 30 '15 at 15:03
  • Do some research on "load balancing". I know several ways of getting the information but don't have the time to find the APIs or packages. – Taylor Kidd Nov 30 '15 at 15:51
  • GNU Parallel is a Perl script that does this sort of thing... – Mark Setchell Nov 30 '15 at 16:33
  • @fifaltra, the 70 is the number of runnable tasks/threads, 1106 is total number of tasks/threads. (In `top` output, the 3% CPU time is idle. IOW, is system nearly fully loaded.) For you the fist number in `loadavg` output is the most relevant: it is the average number of runnable tasks over the last 1 minute. The 68.71 means that system had on average, at any given time, 68.71 tasks to run on your 48 CPUs. IOW, the CPUs are overloaded, there are more tasks to run than CPUs. – Dummy00001 Nov 30 '15 at 17:01
  • @fifaltra, the main problem with `loadavg` is that it is very slow updating. I have used that for load balancing of tasks which run over a very long span of time (hours to days). The slowness of `loadavg` is IMO not really a bug, but a feature: reacting too fast bears danger of overreacting, harming the interactivity. The way I did it, was to increment/decrement (by 1 or 2) the thread pool size every minute or so, to make sure that system has at least 20% of spare CPU time for normal user activity (`20% spare` == `loadavg <= num_cpus*0.8`). – Dummy00001 Nov 30 '15 at 17:10

2 Answers2

2

I'm going to suggest taking a slightly different tack - how about, instead of 'throttling' your number of active processes based on load - why not instead make use of SIGSTOP and SIGCONT.

Parallel::ForkManager gives you running_procs method which returns a list of PIDs.

You can then signal those to STOP when the load average gets 'too high'.

You can find "too high" using Sys::Info::CPU (This also tells you load) or - perhaps look at Number of processors/cores in command line

But notionally - when load goes too high, issue 'SIGSTOP' to some of your child processes. They should drop out of the run queue, and be visible but suspended.

In terms of load average - you get 3 numbers. 1m, 5m and 15m CPU load. Look at the first, and if that's greater than the number of CPUs, you have contention.

Community
  • 1
  • 1
Sobrique
  • 52,974
  • 7
  • 60
  • 101
1

There might always be processes bouncing around and some will be using more CPU than others. I think another approach would be to look at how busy each CPU is using its idle percentage. Something like the snippet below would work for that goal. You can then set a threshold that will determine if it is over a certain idle amount. You can then use the number returned to base your logic on how many processes to start. Something like this would help I believe:

#!/usr/bin/env perl

use strict;
use warnings;
use FileHandle;

#Get number of cores over 95% idle
# this can be adjusted
my $idle_percent=90;
my $free_cores=GetCores($idle_percent);
printf( "Cores over %s free: %s\n",$idle_percent,$free_cores);

sub GetCores {
    my $threshold=shift;
    my $cpu_idle_count=0;

    my $delta_time_sleep=2; #Amount of sleep between the 2 samples
    my @cpu_idle_totals;
    my @cpu_total_totals;

    for(0..1) {
        my $output_fh=FileHandle->new('/proc/stat','r') or die "No stat";
        # Get output of /proc/stat
        while ( my $line=$output_fh->getline() ) {
            chomp($line);
            my ($tag,$user,$nice,$system,$idle,$iowait,$irq,$softirq)
                =split( /\s+/, $line);

            if ( $tag=~ m/cpu(.+)/ ) {
                my $cpu_number=$1;

                my $total=( 
                    $user + $nice + $system + $idle 
                    + $iowait + $irq + $softirq
                );

                if ( defined( $cpu_idle_totals[$cpu_number] ) ) {
                    my $idle_delta=$idle-$cpu_idle_totals[$cpu_number];
                    my $total_delta=$total-$cpu_total_totals[$cpu_number];
                    my $usage=100 * (($idle_delta)/$total_delta);
                    printf("%s is %0.2f%% idle\n",$tag,$usage);

                    if ( $usage >= $threshold ) {
                        $cpu_idle_count++;
                    }
                }

                $cpu_idle_totals[$cpu_number]=$idle;
                $cpu_total_totals[$cpu_number]=$total;

            }
        }

        $output_fh->close();
        sleep $delta_time_sleep;
    }


    return $cpu_idle_count;
} 

Output:

cpu0 is 89.90% idle
cpu1 is 94.97% idle
cpu2 is 95.02% idle
cpu3 is 97.00% idle
cpu4 is 96.98% idle
cpu5 is 98.48% idle
cpu6 is 97.99% idle
cpu7 is 95.98% idle
Cores over 90% free:7
Shizeon
  • 581
  • 3
  • 5