PHP get text from tag with regex

Question

i want to get all text from thiw blow tag and put thats into array with regex

<div class="titr2"><a href="Name.asp?nid=2923">TEXT </a></div>

TEXT is utf-8 and i can not get that with using regex

<meta charset='UTF-8' />
<?php
error_reporting(1);
$handle='http://www.namefa.ir/Names.asp?pn=3&sx=F&fc=%D8%A8';
$handle = file_get_contents($handle);
preg_match_all('<div class="titr2" href=".*">(.*)<a href=".*"></a></div>)siU', $string, $matching_data);
print_r($matching_data);
?>

Apart from your regular expression being syntactically wrong, you are looking for a `href` attribute on the `div` that’s not there … — CBroe, Jan 21 '14 at 10:04

Victor Bocharsky · Accepted Answer · 2014-01-21T11:32:08.057

2

Try to use this regexp:

preg_match_all('/<div[^>]+class="titr2"[^>]*>\s*<a[^>]+>(.*?)<\/a>\s*<\/div>/si', $handle, $matching_data);

edited Jan 21 '14 at 11:32

answered Jan 21 '14 at 10:07

Victor Bocharsky

11,930
13
58
91

Do you try to use this pattern? I already tested them myself and it work :) – Victor Bocharsky Jan 21 '14 at 10:30
yes. this is result:`Array ( [0] => Array ( ) [1] => Array ( ) ) ` – DolDurma Jan 21 '14 at 10:42
result of `var_dump($matching_data);` is:`array(2) { [0]=> array(0) { } [1]=> array(0) { } } ` and `var_dump($string)` is `NULL` – DolDurma Jan 21 '14 at 10:53
you have empty string.. so result is empty. assign your html code to $string variable – Victor Bocharsky Jan 21 '14 at 11:05
1

@TuxWorld Maybe you mean `$handle` instead of `$string` in `preg_match_all` function, check the update, please? – Victor Bocharsky Jan 21 '14 at 11:31

score 1 · Answer 2 · edited May 23 '17 at 12:21

1

You shouldn't use regex to parse HTML: RegEx match open tags except XHTML self-contained tags

You should really use an HTML parser instead.

If this really is a one-time thing, limited to this case only, in a small HTML file that never changes, your regex is wrong:

<div class="titr2"><a href=".+?">(.+?)</a></div>

would be closer and you should checkout Victor's solution.

edited May 23 '17 at 12:21

Community

1
1

answered Jan 21 '14 at 10:06

Robin

9,415
3
34
45

It's need to escape `/` char like `\/` – Victor Bocharsky Jan 21 '14 at 10:17
@TuxWorld try to use regex in my answer – Victor Bocharsky Jan 21 '14 at 10:20
@Victor. can you paste full regex pattern? – DolDurma Jan 21 '14 at 10:21
@robin.`preg_match_all('
(.+?)
', $string, $matching_data);` – DolDurma Jan 21 '14 at 10:21
@TuxWorld I put full regex in my answer – Victor Bocharsky Jan 21 '14 at 10:23
Not necessarily, `/` isn't a regex reserved character. It depends on the pattern delimiter you're using, and here in the question there's just a typo `)` as a delimiter. @TuxWorld, you could probably use `'/
(.+?)<\/a><\/div>/'` indeed. But really, the parser thing.
– Robin Jan 21 '14 at 10:24
@Robin.that return null array. – DolDurma Jan 21 '14 at 10:27
Victor's regex is more efficient, you should try it if you must. But one last time, regex is very probably not the tool you want. – Robin Jan 21 '14 at 10:38

PHP get text from tag with regex

2 Answers2