8

Trying to figure out how to tell whether a string contains any characters in Hebrew with no luck.

How can this be done?

Lior
  • 5,454
  • 8
  • 30
  • 38
  • 2
    I thinks this link will help you http://stackoverflow.com/questions/1694350/how-can-i-detect-hebrew-characters-both-iso8859-8-and-utf8-in-a-string-using-php – Cas van Noort Dec 18 '11 at 00:53

3 Answers3

16

If the source string is UTF-8 encoded, then the simpler approach would be using \p{Hebrew} in the regex.

The call also should have the /u modifier.

 = preg_match("/\p{Hebrew}/u", $string)
mario
  • 144,265
  • 20
  • 237
  • 291
  • Doesn't this miss a `\` in `\\p`? – fge Dec 18 '11 at 09:39
  • @fge: If you want to be super correct :) But `"\p"` is no C-string escape, so will correctly reach the PCRE library as `\p` – mario Dec 18 '11 at 09:44
  • Hmm, so you don't need to escape backslashes in PHP's string literals? I didn't know that. – fge Dec 18 '11 at 09:51
  • @fge: There are only a few you need to escape. For example `"\r\n\t"`. Or otherwise use single quotes where all lose their special meaning. – mario Dec 18 '11 at 09:53
  • Well yes, but here you use double quotes to surround `/\p{Hebrew}/u`. I didn't say the regex wasn't correct, I was simply guessing that a `\` was missing, no? – fge Dec 18 '11 at 09:55
  • Yes, talking about PHP string escapes. In PHP double quotes only `"\n"` and `"\r"` get transliterated into linebreaks in the actual variable value. But `"\p"` has no special meaning, so will remain `\p` in the actual variable value. See http://www.php.net/manual/en/language.types.string.php#language.types.string.syntax.double – mario Dec 18 '11 at 09:58
  • OK, that makes sense! Gee, all languages seem to do that differently :p Thanks for your time! – fge Dec 18 '11 at 10:00
  • No, worries. PHP is especially peculiar there. And just noticed the manual explains zilch. Will fix that tomorrow... – mario Dec 18 '11 at 10:01
3

map of the iso8859-8 character set. The range E0 - FA appears to be reserved for Hebrew.

[\xE0-\xFA]

For UTF-8, the range reserved for Hebrew appears to be 0590 to 05FF.

[\u0590-\u05FF]

Here's an example of a regex match in PHP:

echo preg_match("/[\u0590-\u05FF]/", $string);
JosephRuby
  • 475
  • 3
  • 10
2

The simplest approach would be:

preg_match('/[א-ת]/',$string)

For example,

$strings = array( "abbb","1234","aabbאאבב","אבבבב");

foreach($strings as $string)
{
    echo "'$string'  ";

    echo (preg_match('/[א-ת]/',$string))? "has Hebrew characters in it." : "is not Hebrew";

    echo "<br />";
}
reshetech
  • 853
  • 7
  • 10