3

Trying to run old CGI-scripts under FastCGI. Printing without extra parameters gives proper output: print $q->div( $q->param("text") )

But when printing out with extra parameters hash for CGI-methods print $q->div( {-id=>"id"}, $q->param("text") ), it ruins UTF-8 formed data ('õäöüžš' -> 'õäöüžš')

It happens only with CGI parameters, in script defined variables work fine (examples 3 and 4). Everything works perfecty under ordinary CGI (with "-utf8"-flag ).

FastCGI-turned example script, called as test.fcgi?text=õäöüžš should give four equal blocks:

#!/usr/bin/perl -w --

use strict;
use CGI::Fast qw(:all);
use locale;
use utf8;

BEGIN {
        binmode(STDIN);                       # Form data
        binmode(STDOUT, ':encoding(UTF-8)');  # HTML
        binmode(STDERR, ':encoding(UTF-8)');  # Error messages
}

my ($q) = ();
my $test = "õäöüžš";

while ($q = new CGI::Fast) {

        print $q->header(-type=>"text/html", -charset=>"utf-8"), 
                $q->start_html(-encoding=>"utf-8");

        print "1: ",
                $q->div(  $q->param('text') ),
                "<br />",
                "2: ",
                $q->div( {-id=>"id"},  $q->param('text') ),
                "<br />",
                "3: ",
                $q->div(  $test ),
                "<br />",
                "4: ",
                $q->div( {-id=>"id"},  $test ),
        $q->end_html();

}

First block is fine, second broken, 3rd and 4th also fine:

Ordinary CGI-example as that gives all 4 right way:

#!/usr/bin/perl -w --

use strict;
use CGI qw(:all -utf8);
use locale;
use utf8;

BEGIN {
        binmode(STDIN);                       # Form data
        binmode(STDOUT, ':encoding(UTF-8)');  # HTML
        binmode(STDERR, ':encoding(UTF-8)');  # Error messages
}

my ($q) = ();
my $test = "õäöüžš";
$q = new CGI;

        print $q->header(-type=>"text/html", -charset=>"utf-8"), 
                $q->start_html(-encoding=>"utf-8");

        print "1: ",
                $q->div(  $q->param('text') ),
                "<br />",
                "2: ",
                $q->div( {-id=>"id"},  $q->param('text') ),
                "<br />",
                "3: ",
                $q->div(  $test ),
                "<br />",
                "4: ",
                $q->div( {-id=>"id"},  $test ),
        $q->end_html();

It seems to me, that with FastCGI form-data has no utf8-flag on and i don't understand, how to properly force it? Under CGI.pm i declare as:

use CGI qw(:all -utf8);

But how with FastCGI?

w.k
  • 8,218
  • 4
  • 32
  • 55
  • There is no such thing as a "utf8 flag" at the Perl program level. If you have incoming data that is utf-8 encoded, you can convert it to a Perl string with the function `Encode::decode_utf8`. Actually, all text coming into the program must be decoded, with something like `Encode::decode('latin1', $data)` or `Encode::decode('ascii', $data)`. Raw strings are only for binary data. See tchrist's and my answers on http://stackoverflow.com/questions/6162484/why-does-modern-perl-avoid-utf-8-by-default – jrockway Aug 17 '11 at 19:14
  • @jrockway: therefore i have binmode-lines in BEGIN-block. Because i use CGI with -utf8 flag, i can't apply `:encoding(UTF-8)` for STDIN. CGI.pm decodes form-data (from STDIN and from QUERY_STRING) itself (-utf8 flag). As __chansen__ pointed in answer, binmode may not have effect for FastCGI, but this is certainly over my head ;) Question was about output line no 2, other were fine under CGI and under FastCGI. – w.k Aug 17 '11 at 20:12

3 Answers3

5

1) CGI::Fast is a subclass of CGI.pm, so you can specify the same import arguments.

use CGI::Fast (-utf8);

2) FCGI streams are implemented using the older stream API, TIEHANDLE. Applying PerlIO layers using binmode() has no effect. The proper solution would be to encode your data before outputting it, but if thats not an option I can offer this hotpatch:

#!/usr/bin/perl
use strict;
use warnings;
use utf8;

use CGI::Fast (-utf8);
use FCGI      ();
use Encode    ();

my $enc = Encode::find_encoding('UTF-8');
my $org = \&FCGI::Stream::PRINT;
no warnings 'redefine';
local *FCGI::Stream::PRINT = sub {
    my @OUTPUT = @_;
    for (my $i = 1; $i < @_; $i++) {
        $OUTPUT[$i] = $enc->encode($_[$i], Encode::FB_CROAK|Encode::LEAVE_SRC);
    }
    @_ = @OUTPUT;
    goto $org;
};

my $literal = "õäöüžš";

while (my $q = CGI::Fast->new) {
    print $q->header(-type => "text/html", -charset => "UTF-8"),
          $q->start_html(-encoding => "UTF-8"),
          $q->p("Text 1:" . $literal),
          $q->p("Text 2:" . $q->param('text')),
          $q->end_html;
}
chansen
  • 2,446
  • 15
  • 20
  • Output of your script: `Text 1:õäöüžš Text 2:õäöüžš`. Btw, i have no idea, what happened within 6 months (updated system regularly), but now my test script yields "wide character warnings" on 2 prints and dies on 3rd example row (in log i see just: [warn] mod_fcgid: cleanup zombie process #####). When i asked in Feb, only problematic line was 2nd (`$q->div( {-id=>"id"}, $q->param('text') )`) in my script. – w.k Aug 17 '11 at 19:55
  • 1
    I'm core developer of FCGI and I understand how Perl has implemented support for Unicode. Could you please provide the output of my provided script? – chansen Aug 17 '11 at 20:03
  • Versions of FCGI, CGI and Perl would also provide insight. – chansen Aug 17 '11 at 20:04
  • Perl 5.10.1, CGI.pm 3.43, FCGI.pm 0.71, CGI::Fast 1.07. Do you need output as HTML? Above is output, which i copied from browser, so second case is scrambled (decoded twice?) – w.k Aug 17 '11 at 20:23
  • I can reproduce your issue, you need to upgrade CGI.pm to at-least version 3.47. The "wide character warnings" you see is due to changes in FCGI, thats why i said above that you should encode your data before outputting or use above hotpatch. – chansen Aug 18 '11 at 06:55
  • I want to move out from CGI and clean up a legacy code. Your hotpatch makes it more complicated, so there is no motivation to use it in production. I must look for other ways (i already am). Still, it answers my question and was very good insight and i am really greatful for this. – w.k Aug 18 '11 at 21:52
2

With Git 2.27 (Q2 2020), Gitweb should fix the issue.

See commit 2ecfcde (29 Mar 2020) by Julien Moutinho (ju1m).
(Merged by Junio C Hamano -- gitster -- in commit 7a8bb6d, 22 Apr 2020)

gitweb: fix UTF-8 encoding when using CGI::Fast

Signed-off-by: Julien Moutinho

FCGI streams are implemented using the older stream API: TIEHANDLE, therefore applying PerlIO layers using binmode() has no effect to them.

The solution in this patch is to redefine the FCGI::Stream::PRINT function to use UTF-8 as output encoding, except within git_blob_plain() and git_snapshot() which must still output in raw binary mode.

This problem and solution were previously reported back in 2012:

VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250
-2

I would do:

use strict;
use CGI;
CGI->compile();
use CGI::Fast;
use utf8;
while (new CGI::Fast) {
    $CGI::PARAM_UTF8=1;# may be this????
    my $q =CGI->new;
    #rest of the code should work
}
Беров
  • 1,383
  • 10
  • 22