A pipe connects the output of one process to the input of another process. Neither one knows about the other nor cares how it operates.
But, putting things together like this breaks the Unix pipeline philosophy of small tools that each excel at a very narrow job. Should you link these two things, you'll always have to do both tasks even if you want one (although you could get into configuration to turn off one, but that's a lot of work).
I process a lot of LaTeX, and I control everything through a Makefile. I don't really care about what the commands look like and I don't even have to remember what they are:
short-clean.tex: short.tex
cat short.tex | try1.pl | try2.pl > $@
Let's do it anyways
I'll limit myself to the constraint of basic concatenation instead of complete rewriting or rearranging, most because there are some interesting things to show.
Consider what happens should you concatenate those two programs by simply adding the text of the second program at the end of the text of the first program.
The output from the original first program still goes to standard output and the second program now doesn't get that output as input.
The input to the program is likely exhausted by the original first program and the second program now has nothing to read. That's fine because it would have read the unprocessed input to the first program.
There are various ways to fix this, but none of them make much sense when you already have two working program that do their job. I'd shove that in the Makefile and forget about it.
But, suppose you do want it all in one file.
Rewrite the first section to send its output to a filehandle connected to a string. It's output is now in the programs memory. This basically uses the same interface, and you can even use select
to make that the default filehandle.
Rewrite the second section to read from a filehandle connected to that string.
Alternately, you can do the same thing by writing to a temporary file in the first part, then reading that temporary file in the second part.
A much more sophisticated program would the first program write to a pipe (inside the program) that the second program is simultaneously reading. However, you have to pretty much rewrite everything so the two programs are happening simultaneously.
Here's Program 1, which uppercases most of the letters:
#!/usr/bin/perl
use v5.26;
$|++;
while( <<>> ) { # safer line input operator
print tr/a-z/A-Z/r;
}
and here's Program 2, which collapses whitespace:
#!/usr/bin/perl
use v5.26;
$|++;
while( <<>> ) { # safer line input operator
print s/\s+/ /gr;
}
They work serially to get the job done:
$ perl program1.pl
The quick brown dog jumped over the lazy fox.
THE QUICK BROWN DOG JUMPED OVER THE LAZY FOX.
^D
$ perl program2.pl
The quick brown dog jumped over the lazy fox.
The quick brown dog jumped over the lazy fox.
^D
$ perl program1.pl | perl program2.pl
The quick brown dog jumped over the lazy fox.
THE QUICK BROWN DOG JUMPED OVER THE LAZY FOX.
^D
Now I want to combine those. First, I'll make some changes that don't affect the operation but will make it easier for me later. Instead of using implicit filehandles, I'll make those explicit and one level removed from the actual filehandles:
Program 1:
#!/usr/bin/perl
use v5.26;
$|++;
my $output_fh = \*STDOUT;
while( <<>> ) { # safer line input operator
print { $output_fh } tr/a-z/A-Z/r;
}
Program 2:
#!/usr/bin/perl
$|++;
my $input_fh = \*STDIN;
while( <$input_fh> ) { # safer line input operator
print s/\s+/ /gr;
}
Now I have the chance to change what those filehandles are without disturbing the meat of the program. The while
doesn't know or care what that filehandle is, so let's start by writing to a file in Program 1 and reading from that same file in Program 2:
Program 1:
#!/usr/bin/perl
use v5.26;
open my $output_fh, '>', 'program1.out' or die "$!";
while( <<>> ) { # safer line input operator
print { $output_fh } tr/a-z/A-Z/r;
}
close $output_fh;
Program 2:
#!/usr/bin/perl
$|++;
open my $input_fh, '<', 'program1.out' or die "$!";
while( <$input_fh> ) { # safer line input operator
print s/\h+/ /gr;
}
However, you can no longer run these in a pipeline because Program 1 doesn't use standard output and Program 2 doesn't read standard input:
% perl program1.pl
% perl program2.pl
You can, however, now join the programs, shebang and all:
#!/usr/bin/perl
use v5.26;
open my $output_fh, '>', 'program1.out' or die "$!";
while( <<>> ) { # safer line input operator
print { $output_fh } tr/a-z/A-Z/r;
}
close $output_fh;
#!/usr/bin/perl
$|++;
open my $input_fh, '<', 'program1.out' or die "$!";
while( <$input_fh> ) { # safer line input operator
print s/\h+/ /gr;
}
You can skip the file and use a string instead, but at this point, you've gone beyond merely concatenating files and need a little coordination for them to share the scalar with the data. Still, the meat of the program doesn't care how you made those filehandles:
#!/usr/bin/perl
use v5.26;
my $output_string;
open my $output_fh, '>', \ $output_string or die "$!";
while( <<>> ) { # safer line input operator
print { $output_fh } tr/a-z/A-Z/r;
}
close $output_fh;
#!/usr/bin/perl
$|++;
open my $input_fh, '<', \ $output_string or die "$!";
while( <$input_fh> ) { # safer line input operator
print s/\h+/ /gr;
}
So let's go one step further and do what the shell was already doing for us.
#!/usr/bin/perl
use v5.26;
pipe my $input_fh, my $output_fh;
$output_fh->autoflush(1);
while( <<>> ) { # safer line input operator
print { $output_fh } tr/a-z/A-Z/r;
}
close $output_fh;
while( <$input_fh> ) { # safer line input operator
print s/\h+/ /gr;
}
From here, it gets a bit tricky and I'm not going to go to the next step with polling filehandles so one thing can write and the the next thing reads. There are plenty of things that do that for you. And, you're now doing a lot of work to avoid something that was already simple and working.
Instead of all that pipe nonsense, the next step is to separate code into functions (likely in a library), and deal with those chunks of code as named things that hide their details:
use Local::Util qw(remove_comments minify);
while( <<>> ) {
my $result = remove_comments($_);
$result = minify( $result );
...
}
That can get even fancier where you simply go through a series of steps without knowing what they are or how many of them there will be. And, since all the baby steps are separate and independent, you're basically back to the pipeline notion:
use Local::Util qw(get_input remove_comments minify);
my $result;
my @steps = qw(get_input remove_comments minify)
while( ! eof() ) { # or whatever
no strict 'refs'
$result = &{$_}( $result ) for @steps;
}
A better way makes that an object so you can skip the soft reference:
use Local::Processor;
my @steps = qw(get_input remove_comments minify);
my $processer = Local::Processor->new( @steps );
my $result;
while( ! eof() ) { # or whatever
$result = $processor->$_($result) for @steps;
}
Like I did before, the meat of the program doesn't care or know about the steps ahead of time. That means that you can move the sequence of steps to configuration and use the same program for any combination and sequence:
use Local::Config;
use Local::Processor;
my @steps = Local::Config->new->get_steps;
my $processer = Local::Processor->new;
my $result;
while( ! eof() ) { # or whatever
$result = $processor->$_($result) for @steps;
}
I write quite a bit about this sort of stuff in Mastering Perl and Effective Perl Programming. But, because you can do it doesn't mean you should. This reinvents a lot that make can already do for you. I don't do this sort of thing without good reason—bash
and make
have to be pretty annoying to motivate me to go this far.