1

Am working on a simple search input. It splits terms by space, which works nicely. However it does not recognize other languages spaces.

I'd like to preg_replace other languages spaces to a standardized space.

example,

$pattern       = array(
   //insert other language space codes here (I don't know what they are or how to find them) 
);
$replacement   = ' ';
$string        = "日本語 の スペース です";

$cleaned = preg_replace($pattern, $replacement, $string);
Trevor Wood
  • 2,347
  • 5
  • 31
  • 56
  • 1
    Did you try `preg_replace('/\s/', ' ', $string)`? Maybe Regex will catch other language spaces – sjagr Nov 10 '14 at 17:56
  • @sjagr unfortunately it didn't catch it. It will catch the space if I type in the specific space. Which I will probably do in the meantime. – Trevor Wood Nov 10 '14 at 18:00

1 Answers1

2

Use the u modifier in your pattern along with the \s escape sequence which will match any space character. This would look something like this (using your code):

$pattern   = '/\s/u';
$replacement = '';
$string        = "日本語 の スペース です";

$cleaned = preg_replace($pattern, $replacement, $string);

var_dump($cleaned);

Output:

string(30) "日本語のスペースです"

From the manual:

u (PCRE_UTF8)

This modifier turns on additional functionality of PCRE that is incompatible with Perl. Pattern and subject strings are treated as UTF-8. This modifier is available from PHP 4.1.0 or greater on Unix and from PHP 4.2.3 on win32. UTF-8 validity of the pattern and the subject is checked since PHP 4.3.5. An invalid subject will cause the preg_* function to match nothing; an invalid pattern will trigger an error of level E_WARNING. Five and six octet UTF-8 sequences are regarded as invalid since PHP 5.3.4 (resp. PCRE 7.3 2007-08-28); formerly those have been regarded as valid UTF-8.

Community
  • 1
  • 1
Jeff Lambert
  • 24,395
  • 4
  • 69
  • 96