6

I have some code in a PHP application that is returning null when I try and use it on the production server, but it works fine on the development server. Here is the line of code:

// use the regex unicode support to separate the UTF-8 characters into an array
preg_match_all( '/./us', $str, $match );

What is the u flag dependent on? I tested with mb_string enabled and disabled and it does not seem to affect it.

The error I'm getting is

preg_match_all: Compilation failed: unknown option bit(s) set at offset -1

more info

this is one of the options on the prodction server:

'--with-pcre-regex=/opt/pcre'

and here are the pcre sections

Picture.png

I believe this is the note @Wesley was referring to:

In  order  process  UTF-8 strings, you must build PCRE to include UTF-8
support in the code, and, in addition,  you  must  call  pcre_compile()
with  the  PCRE_UTF8  option  flag,  or the pattern must start with the
sequence (*UTF8). When either of these is the case,  both  the  pattern
and  any  subject  strings  that  are matched against it are treated as
UTF-8 strings instead of strings of 1-byte characters.

Any links or tips on how to "build PCRE to include UTF-8" ?

via

results of pcretest -C

PCRE version 6.6 06-Feb-2006
Compiled with
  UTF-8 support
  Unicode properties support
  Newline character is LF
  Internal link size = 2
  POSIX malloc threshold = 10
  Default match limit = 10000000
  Default recursion depth limit = 10000000
  Match recursion uses stack
Glorfindel
  • 21,988
  • 13
  • 81
  • 109
cwd
  • 53,018
  • 53
  • 161
  • 198
  • Are you testing against a variable or a constant value? Your example has a variable, I think you should test against a constant to ensure you're doing the same on dev and live. – hakre Sep 17 '11 at 18:33
  • 1
    http://php.net/manual/en/reference.pcre.pattern.modifiers.php, see the comments as well. – Wesley Murch Sep 17 '11 at 18:34

2 Answers2

5

This flag depends on PCRE being built with unicode support enabled.

PHP bundles this library and it's normally built with unicode support enabled: The u modifier is available and always works since PHP 4.1.0, when PHP is built with the bundled PCRE library.

However some Linux distributions build PHP against their own build of PCRE, which do not have unicode support enabled, and as a result the u modifier doesn't work on those builds.

The solution is to use an alternative PHP package.

Arnaud Le Blanc
  • 98,321
  • 23
  • 206
  • 194
  • What do I look for to know if it was enabled and what language do I use to tell the sysadmin I need it available? I believe the system is based on CentOS. – cwd Sep 17 '11 at 18:37
  • You could look for the PCRE_UTF8 macro being defined or not in /usr/include/pcre.h – Arnaud Le Blanc Sep 17 '11 at 18:43
  • i don't have root access on the production machine, but i can get into /usr/include and pcre.h is not there. also updated the question with more info. – cwd Sep 17 '11 at 18:48
  • added the results to the question. looks like it is not the same as that bug report since it claims `Unicode properties support` – cwd Sep 17 '11 at 19:38
  • maybe php is linked with an other pcre on the system; try running `ldd /usr/bin/php` and look which libpcre it's linked against – Arnaud Le Blanc Sep 17 '11 at 19:44
  • `ldd /usr/bin/php | grep pcre` gives me `libpcre.so.0 => /lib/libpcre.so.0`. not sure what to do with that. if i execute `/lib/libpcre.so.0` it tells me `Segmentation fault` – cwd Sep 17 '11 at 20:06
1

It depends on the PCRE being compiled with --enable-utf8.

Tom
  • 1,647
  • 11
  • 24
  • this sounds right - can you elaborate? I don't see that in the php_info for the development or production machine. – cwd Sep 17 '11 at 18:35