1

The setup:

A data directory that contains directories for every day of the year. ie data/2014-01-01/ to 2014-12-31. I have a perl script that I run individually inside each date directory.

I am attempting to run a shell script to run from data and go through each directory from 2014-02-15 to 2014-07-20 and run the perl script inside each directory. The perl script takes about 20 seconds to run. This is what I have so far, it will only run on February so far, and doesn't wait for the perl script to finish. I would like it to run on every directory in the range and wait for the perl script inside the loop to finish before relooping.

 #!/bin/bash

 folders=`find 2014-02*`

 for folder in $folders; do 
 cd $folder
 perl C:/Tools/script.pl
 cd ..
 done
john stamos
  • 1,054
  • 5
  • 17
  • 36
  • 1
    Why not add the folder traversing to the perl script rather than a separate shell script? Is each folder's perl script materially different? – quid Feb 03 '15 at 19:55
  • **1** Don't iterate `find` results like that, see [this answer](http://stackoverflow.com/a/7039579/418066). **2** `cd ..` won't take you back if you previously `cd`ed deeper than one directory. – Biffen Feb 03 '15 at 19:56
  • 1
    `find data/ -type d -name '2014-0[2-7]-(1[5-9]|20)' -exec C:/Tools/script.pl {} \;` The above command will work I believe (did not test), but it seems more reasonable to build a script that can take an arbitrary start and end point and apply whatever logic you want. Since it seems likely you will have to do this again. – Hunter McMillen Feb 03 '15 at 19:56
  • @Biffen Thanks for the info but do you have any solutions? – john stamos Feb 03 '15 at 19:57
  • @Hunter I'll try this. Will this wait for the perl script to finish over each iteration? And yes I'd like to have it implemented in a script to run again later. – john stamos Feb 03 '15 at 19:58
  • @johnstamos Your script is eerily similar to [this other question](http://stackoverflow.com/questions/28285639/jump-into-each-subfolder-and-back-again-with-bash), see the accepted answer for solutions. – Biffen Feb 03 '15 at 19:59
  • @HunterMcMillen The difference being that the directory is supplied as an argument, instead of being the working directory. – Biffen Feb 03 '15 at 20:00
  • I believe It will run them in the order they were found, but keep in mind that `find` holds its results in memory so for very large sets this will not be the best approach – Hunter McMillen Feb 03 '15 at 20:00
  • @Biffen That has nothing to do with running a perl script with a shell. – john stamos Feb 03 '15 at 20:00
  • @johnstamos No, but it's got to do with iterating `find` results and `cd`ing into directories and back again. – Biffen Feb 03 '15 at 20:01
  • @Biffen I think that is the only part I have working. I don't cd in or out more than one directory. – john stamos Feb 03 '15 at 20:03
  • @johnstamos You *currently* don't `cd` more than one directory. Just as you're *currently* not dealing with paths with spaces. If you want your script to be the least bit future proof you might want to change it. Oh and I just noticed your `find` command looks wrong, is it even returning what you want it to? – Biffen Feb 03 '15 at 20:06
  • You are correct. I've used it to locate files and just truncated the file part. How could I get it to return dirs? – john stamos Feb 03 '15 at 20:16
  • @johnstamos Hunter gave you a clue in a previous comment: `-type d` But that's not the only problem; the syntax is quite messed up. For details see `man find`. – Biffen Feb 03 '15 at 20:25

1 Answers1

3

Why not do it all in perl? It has perfectly good traversal capability with the File::Find built in module.

Encapsulate your 'script' as a subroutine.

#!/usr/bin/perl

use strict;
use warnings;
use File::Find;

sub your_script_sub {
    my ( $dir ) = @_;
    #do something with $dir. At a worst case, you could just run your script.
    #but there's no real reason to do that, as it's perl already. 
}

sub run_script_in_dirs {
   if ( -d $File::Find::name ) { 
        your_script_sub($File::Find::name);
    }
}

find ( \&run_script_in_dirs, "/path/to/your/dir" );

For bonus points - you could use a thread to parallelise your 'run script in directory:

#!/usr/bin/perl
use strict;
use warnings;
use threads;
use Thread::Queue;

my $num_threads = 4;
my $dir_q = Thread::Queue -> new(); 

sub your_script_sub {
   while ( my $dir = $dir_q -> dequeue() ) {
          # do something in $dir;
   }
}

sub find_dirs_to_run_script {
   if ( -d $File::Find::name ) { 
        $dir_q -> enqueue($File::Find::Name);
    }
}

for ( 1..$num_threads ) {
   threads -> create ( \&your_script_sub );
}

find ( \&find_dirs_to_run_script, "/path/to/dirs" );

$dir_q -> end();

foreach my $thr ( threads -> list() ) { $thr -> join() }
Sobrique
  • 52,974
  • 7
  • 60
  • 101
  • Looks good. It doesn't look like the range of the directories is included in this though right? Also my other perl script uses another program for analysis so it can only be used once at a time. – john stamos Feb 03 '15 at 20:51
  • No. The way perl does it, is with that subroutine, and you can test either `$_` or `$File::Find::name` for matching the pattern you're seeking. Are you sure your other program can only be used one at a time? Is that some licensing issue? – Sobrique Feb 03 '15 at 20:53
  • Yes it uses another program for analysis that goes through and calculates a ton of data. Can't run to sets of data at once. – john stamos Feb 03 '15 at 20:55
  • *shrug*. Well, the first example then. But like I say - it's not particularly common that you have a program that can only run one instance at a time. (Aside from resource constraints). This will traverse anything under `/path/to/dirs` that is a directory. – Sobrique Feb 03 '15 at 20:57
  • if ( -d $File::Find::name ) { your_script_sub($File::Find::name); } I'll change name to be the 2015-02-15 to 2015-07-20 range? – john stamos Feb 03 '15 at 20:57
  • `$File::Find::name` is variable defined by `File::Find` and is the full path to the current file. `$_` is also defined, and is just the current filename. Either you can test for pattern matching/parsing. – Sobrique Feb 03 '15 at 20:59