Preg matching Arabic?

Question

I'm attempting to preg match a link of which is half in English, half in Arabic.

The link as an example looks like:

"/<arabic>/123/<arabic>-<english>.html"

The basic preg_match('@<a href="/(.*?).html" >); returns everything back however the Arabic within the URL means that it is no longer identifiable to a page, returning "Ø¯Ø§Ù†Ù„ÙˆØ¯-Ø±Ø§ÛŒÚ" for example.

I've attempted some things I've seen such as \p{Arabic} however this returns nothing. Is there a way to be able to capture these links?

It's something I'm pretty stumped with and can't figure out a way around this issue.

Edit to add preg match & what I'm attempting to match.

preg_match_all('@<a href="/\p{Arabic}/(.*?)/\p{Arabic}-(.*?)" >@iu',$page,$link);

example text -

"a href="/دانلود-رایگان-کتاب/کتاب-های-خارجی/مطلب/2120-the-essential-financial.html"

could you include a code snippet including the regular expression and sample text you're trying to match against? — Jeff Lambert, Nov 11 '14 at 17:02
this post may help : http://stackoverflow.com/questions/12046526/preg-replace-and-preg-match-arabic-characters — teeyo, Nov 11 '14 at 17:02
I have just edited in the code & example text. Thanks for the link teeyo I did see that but wasn't sure if you had to know what characters were required etc. I will look into that now — , Nov 11 '14 at 17:13

score 0 · Answer 1 · edited May 23 '17 at 11:59

0

Think twice before using regex to parse HTML.

$doc = new DOMDocument();
$doc->loadHTML($yourHTML);

$links = $doc->getElementsByTagName('a');

foreach($links as $link){
  echo $link->getAttribute('href');
}

edited May 23 '17 at 11:59

Community

1
1

answered Nov 11 '14 at 21:36

dynamic

46,985
55
154
231

Preg matching Arabic?

1 Answers1