5

I need a function that matches full words in hebrew in php.

Please help.

Haim Bender
  • 7,937
  • 10
  • 53
  • 55

3 Answers3

8

Try this regular expression describing Unicode character properties:

/\p{Hebrew}+/u
NullUserException
  • 83,810
  • 28
  • 209
  • 234
Gumbo
  • 643,351
  • 109
  • 780
  • 844
5

Assuming your source data is UTF-8 encoded

$input = "ט״סת תעסתינג O״ת סOמע העברעו תעחת";

preg_match_all( "/[\\x{0590}-\\x{05FF}]+/u", $input, $matches );

echo '<pre>';
print_r( $matches );
echo '</pre>';

Yields

Array
(
    [0] => Array
        (
            [0] => ט״סת
            [1] => תעסתינג
            [2] => ״ת
            [3] => ס
            [4] => מע
            [5] => העברעו
            [6] => תעחת
        )

)

I based the range of 0590 through 05FF on this Unicode chart (edit: found more good hebrew/unicode info here). I used this to generate my sample input. Since I don't know hebrew I can't actually verify that the matched output is valid.

You may need to tweak it but hopefully this gets you headed in the right direction.

Peter Bailey
  • 105,256
  • 31
  • 182
  • 206
  • i just need to check if a string is one single word in Hebrew, do I need to specify start and end of string in the regex? How can I implement this? – Haim Bender Dec 16 '09 at 18:56
  • Yeah, I suppose that would work - again I'm not going to pretend to understand the grammar of Hebrew. – Peter Bailey Dec 16 '09 at 19:13
  • @PeterBailey It is possible to use it in my php script? http://www.iosart.com/nlp/heb_enc_dec.html – shmoolki May 04 '16 at 09:42
2

Thanks for all your answers,

The one that works for me is preg_match("/^\p{Hebrew}+$/u", "שלום");

Haim Bender
  • 7,937
  • 10
  • 53
  • 55