5

I have come across a peculiarity in a plperl stored procedure on Postgres 9.2 with Perl 5.12.4.

The curious behavior can be reproduced using this "broken" SP:

CREATE FUNCTION foo(VARCHAR) RETURNS VARCHAR AS $$
    my ( $re ) = @_;
    $re = ''.qr/\b($re)\b/i;
    return $re;
$$ LANGUAGE plperl;

When executed:

# select foo('foo');
ERROR:  Unable to load utf8.pm into plperl at line 3.
BEGIN failed--compilation aborted.
CONTEXT:  PL/Perl function "foo"

However, if I move the qr// operation into an eval, it works:

CREATE OR REPLACE FUNCTION bar(VARCHAR) RETURNS VARCHAR AS $$
    my ( $re ) = @_;
    eval "\$re = ''.qr/\\b($re)\\b/i;";
    return $re;
$$ LANGUAGE plperl;

Result:

# select bar('foo');
       bar       
-----------------
 (?^i:\b(foo)\b)
(1 row)
  1. Why does the eval bypass the automatic use utf8?

  2. Why is use utf8 even required in the first place? My code is not in UTF8, which is said to be the only time one should use utf8.

    If anything, I might expect the eval version to break without use utf8, in the case where the input to the script contained non-ASCII values. (Further testing shows that passing non-ASCII values to bar() does indeed cause the eval to fail with the same error)


Note that many Postgres installations automatically load 'utf8' on startup of the perl interpreter. This is the default in Debian at least, as demonstrated by executing DO 'elog(WARNING, join ", ", sort keys %INC)' language plperl;:

WARNING: Carp.pm, Carp/Heavy.pm, Exporter.pm, feature.pm, overload.pm, strict.pm, unicore/Heavy.pl, unicore/To/Fold.pl, unicore/lib/Perl/SpacePer.pl, utf8.pm, utf8_heavy.pl, vars.pm, warnings.pm, warnings/register.pm
CONTEXT: PL/Perl anonymous code block
DO

But not so on the machine demonstrating the odd behavior:

WARNING: Carp.pm, Carp/Heavy.pm, Exporter.pm, feature.pm, overload.pm, overloading.pm, strict.pm, vars.pm, warnings.pm, warnings/register.pm
CONTEXT: PL/Perl anonymous code block
DO

This question is not about how to get my target machine to load utf8 automatically; I know how to do that. I'm curious why it seems to be necessary in the first place.

Jonathan Hall
  • 75,165
  • 16
  • 143
  • 189
  • (Didn't quite work because you forgot to escape the `\ `. You should also escape the second `$`.) – ikegami Dec 03 '13 at 15:27
  • @ikegami: The second `$` should not be escaped, but good call on the `\ ` chars; now I get the proper output, which makes the question even more mysterious. – Jonathan Hall Dec 03 '13 at 15:32
  • Yes, it should. If you don't escape the second `$`, it will fail for `$re='/; system("rm -rf /"); qr/';`. Injection error! – ikegami Dec 03 '13 at 15:46
  • @ikegami: Indeed; that case can be ignored for the purpose of this test. Although escaping the `$` also causes the eval to fail in the same way. Hmmm. – Jonathan Hall Dec 03 '13 at 15:50
  • @ikegami: That causes it to die with the 'failed to load utf8.pm' as well. It seems perl is pretty smart about what type of string is passed to `qr//` or `eval()`. And by default, $re is a unicode string (by perl default, right?), so it behaves accordingly when executing `qr`. – Jonathan Hall Dec 03 '13 at 15:55

2 Answers2

4

In the verison that's failing, you're executing

$re = ''.qr/\b($re)\b/i

In the version that's succeeding, you're executing

$re = ''.qr/\b(foo)\b/i

Sounds like qr// needs utf8.pm when the pattern was compiled as a Unicode pattern (whatever that means), but the latter isn't compiled as a Unicode pattern.


The failure to load utf8.pm is due to the limitations imposed by the Safe compartment created by plperl.

The fix is to load the module outside the Safe compartment.

The workaround is to use the more efficient

$re = '(?^u:\\b(?i:'.$re.')\\b)';
ikegami
  • 367,544
  • 15
  • 269
  • 518
  • I think you answered the question; although your work-around really isn't a work-around, as it doesn't do the same thing. – Jonathan Hall Dec 03 '13 at 15:59
  • Nope, it's the same even if `$re=qr/foo/i;` or `$re='/foo/i';`. (Not sure which one you meant.) The only difference that comes to mind is that it won't die if the pattern in `$re` is illegal. – ikegami Dec 03 '13 at 16:03
  • I'll investigate `qr` further. Thanks. – Jonathan Hall Dec 03 '13 at 16:05
2

I had the same issue and I fixed it by adding

plperl.on_init = 'use utf8; use re; package utf8; require "utf8_heavy.pl";'

to postgresql.conf file.

I hope this will help someone.

Vajira Lasantha
  • 2,435
  • 3
  • 23
  • 39