0

I simply want to open a compressed/uncompressed file in the background and produce a new file based on the processing done on the compressed file.

I could do it with Parallel::ForkManager, but I believe that is not available.

I found this, but am not sure how to use it:

sub backgroundProcess {
    my $file = shift;
    my $pid  = fork;
    return if $pid;    # in the parent process
    &process_file($file);
    exit;              # end child process
}

sub process_file {
    my $file    = shift;
    my $outFile = $file . ".out";
    # ...here...
    open( readHandle,  "<", $file )    or die print "failed $!";
    open( writeHandle, ">", $outFile ) or die "failed write $!";
    # some processing here.....
    # and then closing handles...
}

The loop:

foreach my $file (@filesToProcess) {
    &backgroundProcess($file);
}

My questions:

  1. does the child process created in backgroundProcess run even after the return occurs (in the line return if $pid?
  2. in process_file, how do I make sure a unique file handle is open for each file, or will "fork" take care of it?
  3. in the loop (going through @filesToProcess), I want to run only a certain number of processes at a time, so how do I check if number of background process is equal to $LIMIT, and then open a new one as an old one finishes?
Jonathan Hall
  • 75,165
  • 16
  • 143
  • 189
rajeev
  • 1,275
  • 7
  • 27
  • 45
  • If you don't want to use cpan module, I suggest http://stackoverflow.com/questions/17155204/lightweight-fork-replacement-for-threads `my $fork = fasync { "do unzip" }; $fork->();` – mpapec Sep 24 '14 at 14:25
  • 2
    Use `Parallel::ForkManager`. What do you mean by *"that is not available"*? – Borodin Sep 24 '14 at 14:30
  • 2
    Also: Don't prefix your sub call with '&'. That used to be necessary, but is now redundant at best - and breaks things in subtle ways at worst. – Sobrique Sep 24 '14 at 14:39
  • I'd assume 'not available' means the OP is under some sort of policy constraint that disallows or restricts downloading and using stuff from 'tinternet. It's not uncommon for places to be really fussy about key servers. – Sobrique Sep 24 '14 at 14:43
  • @Sobrique I'd like to make the same assumption, but *most* people who say they "can't use CPAN" actually mean that they don't have root access and don't know how to install/use modules without admin rights. – ThisSuitIsBlackNot Sep 24 '14 at 14:46
  • I do not have admin rights, and too much bureaucracy to add anything anyway. so the regular install of perl v5.6.1 is all i have. – rajeev Sep 24 '14 at 14:50
  • You don't need admin rights. https://github.com/tokuhirom/plenv – mpapec Sep 24 '14 at 14:55
  • 1
    @rajeev You [don't need](http://stackoverflow.com/q/3735836/176646) . [root](http://stackoverflow.com/q/2980297/176646) . [to use](http://stackoverflow.com/q/251705/176646) . [CPAN](http://stackoverflow.com/q/13957431/176646). Worst case, download the source and include it with your script; the licensing for most modules is pretty permissive. I'd be more worried about your **ancient** version of Perl than anything else, though. 5.6.1 was released in 2001! – ThisSuitIsBlackNot Sep 24 '14 at 14:57
  • 2
    @Sobrique, Downloads from StackOverflow are just as must downloads from the internet as downloads from CPAN. – ikegami Sep 24 '14 at 15:01
  • You don't need to convince me. I'm merely trying to point out - lots of companies have policies about what's permitted on their systems. Whether that policy is sane or appropriate is rarely in the hands of the person trying to get their job done. – Sobrique Sep 24 '14 at 15:16

2 Answers2

3

If I understand the title of your question, you are looking for Parallel::ForkManager.

I do not understand why Parallel::ForkManager is not available. It is a pure Perl module.

use Parallel::ForkManager;

my $pm = Parallel::ForkManager->new($MAX_PROCESSES);

for my $file (@filesToProcess) {
  # Forks and returns the pid for the child:
  my $pid = $pm->start and next;

  ... do some work with $data in the child process ...

  $pm->finish; # Terminates the child process
}

You can just copy the module's .pm file in a place you can find. For example:

/some/custom/path/myscript
/some/custom/path/inc/Parallel/Forkmanager.pm

Then, in myscript:

use FindBin qw( $RealBin );
use lib "$RealBin/inc";
use Parallel::ForkManager;

And, of course, if, for some unfathomable reason you can't do that, you can always fatpack your script.

ikegami
  • 367,544
  • 15
  • 269
  • 518
Sinan Ünür
  • 116,958
  • 15
  • 196
  • 339
  • What is the best way for *long living* forks to share data with parent? – mpapec Sep 24 '14 at 14:41
  • Not every site allows download and installation of ad-hoc stuff from the internet. As well respected as CPAN is, I'd still have fun and games getting it deployed on certain of my production servers. – Sobrique Sep 24 '14 at 14:42
  • @Sobrique it should not be a problem if these are pure perl modules? – mpapec Sep 24 '14 at 14:43
  • From my perspective, I'd shrug and go 'yeah, that's fine'. To get the change signed off, I'd have to convince a bunch of people that there's no danger of malicious, buggy or harmful code within the module. This might take quite a bit of time and money, and might go unfunded for a long time unless I can make a particularly compelling reason as to why I need it. (And 'because hand rolling a fork limiter is messy' is probably not strong enough). – Sobrique Sep 24 '14 at 14:47
  • 2
    @Sobrique I understand you, but counterargument is that hand rolled code could be even buggier and has hidden security holes as it wasn't widely tested. `:)` – mpapec Sep 24 '14 at 14:53
  • You don't need to convince me on this point. I'm merely trying to point out that not all organisations take this view. – Sobrique Sep 24 '14 at 15:34
  • thx i was able to use module like that, i will now try to run it. – rajeev Sep 25 '14 at 17:21
0

Re Q1: Yes. Only the parent process will execute the return, as $pid will be zero in the child process.

Re Q2: Not sure if I'm understanding your question correctly. open() will be executed in the child process, so file handles will be local to the child process.

Re Q3: You'll have to keep track manually. Once the limit has been reached, call wait() to wait for one child to exit before starting a new child process. See http://perldoc.perl.org/functions/wait.html

Sebastian
  • 1
  • 1