0

While the result of pcretest -C in my server says that pcre supports utf8, but the following code always returns false even if I enter a matching pattern, and seems that it doesn't recognize utf-8 characters:

   $pattern = '/^\x{06F0}?\x{06F9}\d{9}$/u';
   if (!preg_match($pattern, $value)) { // $value is a function parameter
      return false;
   }
   return true;

Output of pcretest -C:

PCRE version 7.8 2008-09-05
Compiled with
  UTF-8 support
  Unicode properties support
  Newline sequence is LF
  \R matches all Unicode newlines
  Internal link size = 2
  POSIX malloc threshold = 10
  Default match limit = 10000000
  Default recursion depth limit = 10000000
  Match recursion uses stack

PHP version: 5.3.2

This code works as expected in my localhost.

Any suggestion?

Cœur
  • 37,241
  • 25
  • 195
  • 267
hpn
  • 2,222
  • 2
  • 16
  • 23

1 Answers1

2

Works here (note html_entity_decode's charset default changed to UTF-8 in PHP 5.4):

$ cat a.php
<?php
$pattern = '/^\x{06F0}?\x{06F9}\d{9}$/u';
var_dump(preg_match($pattern, html_entity_decode('&#x6F9;123456789')));
$ php a.php 
int(1)

Note that PHP, by default, doesn't use the system PCRE library (though many distros, for obvious reasons, use the system PCRE library). Type php -i and look for the PCRE section to get more information about the version your binaries use.

Artefacto
  • 96,375
  • 17
  • 202
  • 225
  • Thanks. it seems that the problem is with html_entity_decode function. I tested your code in localhost and production server. in localhost it returns `int(1)` but in the server it returns `int(0)`. My code is a part of a drupal module. and drupal uses `html_entity_decode` function. Is there any way to solve the problem without updating php? – hpn Oct 02 '13 at 09:19
  • @hpn There's no "problem" with `html_entity_decode`, all that's changed is its default value for the third argument. If you do `html_entity_decode('۹123456789', ENT_COMPAT, 'UTF-8')` the behavior should be the same in PHP 5.3 and 5.4. – Artefacto Oct 02 '13 at 09:23