How to get the title of an article using regex?

Question

I want to get the title of an article from this page using regex and simplehtmldom : http://laperuanavegana.wordpress.com/about/

in this case title is : Cómo preparar SEITÁN

Here is my regex :

$html = file_get_html($url);
preg_match_all("title=(.*?)",$html->innertext,$title);
echo "this is title ".$title[0][0]."<br>";

It would be helpful if anyone help me to find the bug.

[You shouldn't try to parse HTML with RegEx](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) — Bohemian, Aug 15 '11 at 05:46

Ray Toal · Accepted Answer · 2011-08-15T04:52:16.107

2

I think you need to look for text between <title> and </title>, not for text following title=.

For example:

$html = "Sometext<title>Seitan</title>More text";
preg_match_all('|<title>(.*?)</title>|',$html,$title);
echo "this is title ".$title[1][0]."<br>";

edited Aug 15 '11 at 04:52

answered Aug 15 '11 at 04:43

Ray Toal

1 Answers1