-2

i want to get all text from thiw blow tag and put thats into array with regex

<div class="titr2"><a href="Name.asp?nid=2923">TEXT </a></div>

TEXT is utf-8 and i can not get that with using regex

<meta charset='UTF-8' />
<?php
error_reporting(1);
$handle='http://www.namefa.ir/Names.asp?pn=3&sx=F&fc=%D8%A8';
$handle = file_get_contents($handle);
preg_match_all('<div class="titr2" href=".*">(.*)<a href=".*"></a></div>)siU', $string, $matching_data);
print_r($matching_data);
?>
DolDurma
  • 15,753
  • 51
  • 198
  • 377
  • Apart from your regular expression being syntactically wrong, you are looking for a `href` attribute on the `div` that’s not there … – CBroe Jan 21 '14 at 10:04

2 Answers2

2

Try to use this regexp:

preg_match_all('/<div[^>]+class="titr2"[^>]*>\s*<a[^>]+>(.*?)<\/a>\s*<\/div>/si', $handle, $matching_data);
Victor Bocharsky
  • 11,930
  • 13
  • 58
  • 91
1

You shouldn't use regex to parse HTML: RegEx match open tags except XHTML self-contained tags

You should really use an HTML parser instead.

If this really is a one-time thing, limited to this case only, in a small HTML file that never changes, your regex is wrong:

<div class="titr2"><a href=".+?">(.+?)</a></div>

would be closer and you should checkout Victor's solution.

Community
  • 1
  • 1
Robin
  • 9,415
  • 3
  • 34
  • 45