3

I have a perl script which writes messages to STDOUT and STDERR (via print/croak statements), but I also redirect the STDOUT and STDERR to a logfile:

File::Tee::tee STDOUT, ">>", "$logFile" ;
File::Tee::tee STDERR, ">>", "$logFile" ;

Now the output logfile has messages from STDOUT and STDERR displayed out of order. Also the actual output on terminal is also out of order. I have tried flushing the buffers (as recommended here: https://perl.plover.com/FAQs/Buffering.html) but it doesn't help:

select(STDERR) ;
$| = 1 ;
select(STDOUT) ;
$| = 1 ;

Does anyone know what I must do to see the output in order (I also tried additionally flushing the filehandle corresponding to $logfile, but its still the same)?


EDIT:

Thanks to all of you who have replied. A lot of the discussion over this ended up in comments, so I am going to list the few things which I tried based on feedback from all of you.

  1. I already was flushing the STDOUT and STDERR before I used File::Tee. As @jimtut suspected, File::Tee was indeed the culprit - removing it restored the ordering on console. But I did want to redirect the STDOUT and STDERR.
  2. @mob suggested using IO::Tee instead, but I haven't fully understood how to make that work they way I want in my code.
  3. @briandfoy pointed out that there isn't a reliable way to ensure 2 seperate filehandles are seen in the correct order in realtime and also suggested using a logging routine which is the only place which can write to STDOUT/STDERR. @zimd further pointed out that File::Tee uses fork which is the heart of the issue since 2 processes cannot guarantee any order on output.
  4. Since the File::Tee is to blame, I attempted to remove this from the code. I updated my logger function to print to STDOUT/STDERR as well as to additionally print to $log filehandle. Further for capturing the warns in the log, I did the following:
sub warning_handler {
  my $msg = $_[0] ;
  print STDERR $msg ;
  print $log $msg if defined $log ;
}
$SIG{__WARN__} = \&warning_handler ;

This worked great for all of the code under my control. Everything was printing in order now both on console and logfile. However I realized I can't use this solution since I was also calling someone else's perl packages for some functionality and clearly I couldn't intercept the print/croak etc which wrote to STDOUT/STDERR within the 'off the shelf' package. So right now, I don't have a good solution. However I suspect if I can find someway to intercept STDOUT/STDERR within perl, I might be able to get what I need.


EDIT2: I added my own answer which is probably the closest I got to solving the problem by modifying mob's solution of using IO::Tee instead of File::Tee, but even this misses some messages (though it fixes ordering).


EDIT3: Finally found the 'solution'

use IO::Tee ;
use Capture::Tiny qw(capture);
...
...
select(STDERR) ;
$| = 1 ;
select(STDOUT) ;
$| = 1 ;
open (my $log, ">", $logfilename) ;
*REALSTDOUT = *STDOUT ;
*REALSTDERR = *STDERR ;
*STDOUT = IO::Tee->new(\*REALSTDOUT, $log);
*STDERR = IO::Tee->new(\*REALSTDERR, $log);

# Regular Perl code here which sends output to STDOUT/STDERR
...
...
# system calls / calls to .so needs to be catpured 
&log_streams(sub { &some_func_which_calls_shared_object() ; }) ;

sub log_streams {
    my ($cr, @args) = @_;  # code reference, with its arguments
    my ($out, $err, $exit) = capture { $cr->(@args) };

    if ($out) {
        print STDOUT $out;
    }
    if ($err) {
        print STDERR $err;
    }
}

The use of IO::Tee ensures all perl generated outputs to console also go to logfile, and this happens immediately thereby updating the log and console in realtime. Since IO::Tee is changing the meaning of STDOUT/STDERR filehandles to now refer to the teed handles, it can only intercept stdio from perl statements, it misses sys calls since they bypass perl's STDOUT/STDERR handles. So we capture the syscall output and then use the log_streams routine to forward it to the now aliased STDOUT/STDERR streams. This creates a delay in the system call generated output showing up in log/terminal but there is no delay for perl generated output - i.e. best of both worlds. Do note that the ordering of stderr and stdout generated by an invocation of subroutine some_func_which_calls_shared_object is not preserved since in the log_streams routine, we first print to STDOUT and then to STDERR - as long as the system call is atomic and doesn't do much in terms of interleaving stdout/stderr messages we should be ok. Appreciate solutions from briandfoy, mob and zimd whose answers I combined to arrive at this solution! Never thought it would require to go through this detail for what seems a very simple problem.

backstreetrover
  • 203
  • 1
  • 11
  • 1
    If you REMOVE the use of `tee`, and just print to the console, is it in order? I'm guess that it is, and `tee` is to blame. – jimtut Feb 14 '20 at 19:03
  • Indeed File::Tee::tee is the problem. Removing it fixes the output ordering on terminal. But I really wanted to redirect this to a logfile. – backstreetrover Feb 14 '20 at 19:09
  • 1
    If you're using a command-line Perl script (not a CGI script), create a "wrapper" script (Shell or Batch, depending on your platform. Inside that wrapper, call your Perl script and then redirect STDOUT and STDERR. It looks like this for Linux, and I believe it's almost identical for Windows: `myscript.pl >> logfile 2>>&1` – jimtut Feb 14 '20 at 19:20
  • yeah am aware I can redirect outside the script .. but this isn't bash/csh wrapper, its meant to be a direct perl utility. – backstreetrover Feb 14 '20 at 19:25
  • Did you do `select(STDERR) ; $| = 1 ; select(STDOUT) ; $| = 1 ;` *before* calling `tee`? A quick look at the code indicates this should work. – ikegami Feb 15 '20 at 07:51
  • @ikegami yes I did. It does not work. – backstreetrover Feb 15 '20 at 22:51
  • Please update the question to show everything you've tried. Hiding that in comments doesn't help people help you. – brian d foy Feb 16 '20 at 00:00

4 Answers4

6

With two separate file handles, there's no contract or guarantee that you'll see them in real time. Various settings and buffers affect that, which is why you see the auto flush stuff ($|). It's the same idea for files or the terminal.

Realize this is an architectural issue rather than a syntactic one. You have two things competing for the same resource. That usually ends in tears. I hesitate to suggest a solution when I don't know what the problem is, but consider having whatever is trying to write to STDOUT or STDERR write to some sort of message broker that collects all the messages and is the only thing that writes to the final (shared) destination. For example, things that want to add entries to the syslog don't write to the syslog; they send messages to the thing that writes to the syslog.

A more Perly example: in Log4perl, you don't write to the final destinations. You simply log a message and the logger is the single thing figures out how to handle it. When I want this sort of behavior with the module, I don't use output facilities directly:

debug( "Some debug message" );

sub debug {
    my $message = shift;
    output( "DEBUG: $message" );
    }

sub output { # single thing that can output message
    ...
    }

Then do whatever you need to do in output.

But, you can't always control that in other things that are also trying to output things. Perl let's you get around this by redefining what warn and friends do by putting a coderef in $SIG{__WARN__}. You can capture warning messages and do whatever you like with them (such as sending them to standard output). Beyond that is black magic that reopens STDERR onto something you can control. It's not that bad and it's isolated in one place.

At some point where another person doesn't want merged output, and the intrusive solutions make it impossible to separate them. I'd much prefer flexibility than hard-coded constraint. If I want just the errors, I want a way to get just the errors. There are many other sorts of workarounds, such as wrappers that collect both output streams (so, not at all intrusive) and various command-redirections.

brian d foy
  • 129,424
  • 31
  • 207
  • 592
  • Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackoverflow.com/rooms/207908/discussion-on-answer-by-brian-d-foy-stdout-and-stderr-out-of-order). – Bhargav Rao Feb 16 '20 at 00:15
  • @briandfoy Aah my bad .. sorry for misunderstanding. No wonder my messages are out of order if its forking! I am going to get rid of the File::Tee and simply handle the logging to a file in my logger subroutine. And will also use a $SIG{__WARN__} handler to redirect warns to the logger routine. This should solve all problems. – backstreetrover Feb 16 '20 at 00:17
4

You will have two filehandles writing to $logfile. Unless File::Tee takes care to seek to the end of the filehandle before every write (which it doesn't appear to), you will get a race condition where one filehandle will overwrite the other.

A workaround would be to use the reopen option to the File::Tee::tee function -- that will close the file after each write and reopen it (at the proper end of the file) before the next write. That could hurt your performance though, depending on how often you write to those filehandles.


You might also have better luck with IO::Tee, which is a more straightforward implementation (using tied filehandles) than what File::Tee uses (a background process for each File::Tee::tee call), so you may get fewer surprises. Here is how an IO::Tee solution might look:

use IO::Tee;
$| = 1;
open my $stdout, ">&=1";  # to write to original stdout
open my $stderr, ">&=2";  # to write to original stderr
open my $fh_log, ">>", $logfile;
*STDOUT = IO::Tee->new($stdout, $fh_log);
*STDERR = IO::Tee->new($stderr, $fh_log);
...

There are no background process, extra threads, or anything else to cause a race condition. Both STDOUT and STDERR will both write to the same log filehandle from the same process.

mob
  • 117,087
  • 18
  • 149
  • 283
  • But this doesn't explain why the output on the terminal is also out of order. Surely it has nothing to do with the LOGFILE filehandle. – backstreetrover Feb 14 '20 at 18:49
  • I tried using reopen with File::Tee. It didn't solve the problem. – backstreetrover Feb 14 '20 at 19:03
  • You might have better luck with `IO::Tee`, then. That is a more straightforward multiplexing implementation that I suspect will have fewer surprises. – mob Feb 14 '20 at 21:33
  • Not sure how I could use IO::Tee. As far as I understand IO::Tee gives you a one to many handles multiplexing - i.e. you can write to a single $tee handle and it will end up writing to multiple handles. I kind of want the opposite - I want STDOUT and STDERR to me merged into one FILEHANDLE (while preserving the console output) – backstreetrover Feb 15 '20 at 05:59
  • First open a filehandle to your logfile, and then set both of your `IO::Tee` objects to write on that filehandle. – mob Feb 15 '20 at 17:56
  • if I follow what you are saying I will have 2 teed filehandles which I need to use in place of STDOUT and STDERR through the rest of my code. Further I will need to capture the warns, and redirect to the teed filehandle. While I can specify the teed handle in prints, this wouldn't (?) be possible for croak and friends (cluck/confess) – backstreetrover Feb 15 '20 at 23:27
  • 1
    I will add an `IO::Tee` solution above. – mob Feb 16 '20 at 01:18
  • thanks for the solution. This looks perfect and I certainly expected it to work. However strangely it doesn't work well. With this solution, the messages written to logfile are in order, however the console ordering of STDOUT and STDERR is incorrect. Furthermore, messages from someone else's package which I call (which is itself swig'ed from C++ to perl) is completely missing in the log though it shows up in the console. No clue why this behavior. – backstreetrover Feb 16 '20 at 06:25
  • using typeglob aliases instead of dup fixes ordering on console and logfile - see my answer to original question – backstreetrover Feb 16 '20 at 07:20
2

Note   The first part is done via tie-d handles; solution in the second part uses Capture::Tiny


A bare-bones proof-of-concept for an approach using tie-d handles.

The package that ties a handle, by printing from it to a file and to (a copy of) STDOUT stream

package DupePrints; 

use warnings; 
use strict; 
use feature 'say'; 

my $log = 't_tee_log.out'; 
open my $fh_out, '>', $log or die $!;  # for logging

# An independent copy of STDOUT (via dup2), for prints to terminal
open my $stdout, '>&', STDOUT or die $!;

sub TIEHANDLE { bless {} } 

sub PRINT { 
    my $self = shift; 

    print $fh_out @_; 
    print $stdout @_;
}

1;

A program that uses it

use warnings;
use strict;
use feature 'say';

use DupePrints;    
$| = 1;
tie *STDERR, 'DupePrints';
tie *STDOUT, 'DupePrints';

say "hi";

warn "\t==> ohno";

my $y;
my $x = $y + 7;

say "done";

This prints to both the terminal and to t_tee_log.out the text

hi
        ==> ohno at main_DupePrints.pl line 14.
Use of uninitialized value $y in addition (+) at main_DupePrints.pl line 17.
done

See perltie and Tie::Handle, and this post with related examples, and perhaps this post

The logging to a file of STDOUT and STDERR streams (along with a copied printout) works across other modules that may be used in the main program, as well.

To also have "clean" prints, that don't get logged, copy the STDOUT handle in the main program, like it's done in the module, and print to that. If you need to use this in a more selective and sophisticated manner please modify as needed -- as it stands it is meant to be only a basic demo.


With the clarification in the question's edit, here is a different approach: wrap a call to Capture::Tiny, which captures all output from any code, and then manage the captured prints as needed

use warnings;
use strict;
use feature qw(say state);

use Capture::Tiny qw(capture);

sub log_streams {
    my ($cr, @args) = @_;  # code reference, with its arguments

    # Initialize "state" variable, so it runs once and stays open over calls
    state $fh_log = do {
        open my $fh, '>', 'tee_log.txt' or die $!;
        $fh;
    };

    my ($out, $err, $exit) = capture { $cr->(@args) };

    if ($out) {
        print $fh_log $out;
        print         $out;
    }
    if ($err) {
        print $fh_log $err;
        print         $err;
    }
}

log_streams( sub { say "hi" } );
log_streams( sub { warn "==> ohno" } );
log_streams( sub { my $y; my $x = $y + 7; } );

log_streams( sub { system('perl', '-wE', q(say "external perl one-liner")) } );

log_streams( sub { say "done" } );

The downside of all this is that everything needs to run via that sub. But then again, that's actually a good thing, even if sometimes inconvenient.

The state feature is used to "initialize" the filehandle because a variable declared as state is never re-initialized; so the file is opened only once, on the first call, and stays opened.

This is also a demo in need of completion.

zdim
  • 64,580
  • 5
  • 52
  • 81
  • 2
    To the reader: I've carefully reviewed this and don't get the -1. If anybody does I'd appreciate comments – zdim Feb 15 '20 at 22:31
  • @zimd thanks a lot for the reply. However I am not sure I understand how any of this works. Anyhow, I will read more on tie'd filehandles since I suspect this will lead me to the solution – backstreetrover Feb 16 '20 at 05:22
  • @backstreetrover Oh, sorry, will add some commentary. This is indeed one way to get what you want. (While the posted demo works as it stands it should be filled in, per docs. That shouldn't be hard.) – zdim Feb 16 '20 at 05:56
  • 1
    @backstreetrover Added a section with another approach, which does all that's spelled out in your edit – zdim Feb 16 '20 at 06:49
  • yes thanks for the Capture::Tiny example - actually maybe I can since i just need to enclose my whole script in log_streams. But it looks like it does delay the printing a bit, since it captures everything while executing, and then does printing at the end. The upside is I get order correct, downside being the output is delayed which is bad since my script may take a while to complete. – backstreetrover Feb 16 '20 at 07:40
  • 1
    @backstreetrover "_downside being the output is delayed_" --- correct, this module redirects in batches. To get prints right as they come use `IPC::Run` (good to always use anyway :)), for example as in [this post](https://stackoverflow.com/a/51702507/4653379). Then you can have that in one sub, for external commands (with potentially unknown dynamics and prints), and use the sub from this post for your own prints. Could make it for a nice little module (with yet other related routines). But I see that you put together a solution already so that's great :) – zdim Feb 17 '20 at 06:49
2

After taking hint from @mob's answer to use IO::Tie instead of File::Tee (since the latter uses fork causing out of order STDERR vs STDOUT), I modified mob's original solution a bit and it worked (almost - read on):

use IO::Tee
...
...
open (my $log, ">", $logfilename) ;
*MYSTDOUT = *STDOUT ;
*MYSTDERR = *STDERR ;
*STDOUT = IO::Tee->new(\*MYSTDOUT, $log);
*STDERR = IO::Tee->new(\*MYSTDERR, $log);

This resulted in the correct ordering on both console and in the logfile (mob's original solution using open to dup the STDOUT/STDERR didn't work - it resulted in correct order in logfile, but out of order on console. Using a typeglob alias instead of dup works for some strange reason).

However, as good as this solution sounds, it missed printing some messages from a package which I call in the logfile (though they get printed on console). My original code which had File::Tee did result in these same messages from package being shown in the logfile, so there is some voodoo going on somewhere. The specific package in question is a .so file so I have no visibility into how exactly it prints its messages.

EDIT: I guess that the .so file is as good as an external system command printing to stdout/stderr. Since its not going through perl IO, the handles pointed to by STDOUT/STDERR typeglobs within perl will not reflect the output of external programs called from perl. I guess the best solution would be to use a combination of this solution for messages coming from within the perl code, and using Capture::Tiny::capture as pointed out by @zdim for capturing and redirecting messages from system calls/calls going through the swig interface.

backstreetrover
  • 203
  • 1
  • 11