0

I have a little problem with the pattern for retrieving the title of the remote page while same pattern is giving the result here is the sample.

preg_match_all('|<title>(.*)</title>|U',$this->data,$title);

is for other url and giving the result . While following return the empty array

preg_match_all('|<title>(.*)</title>|U', $valD, $title);

can any on tell me the is any problem with these lines.

Where $this->data and $valD holds the content of two different urls for different servers.

Please help me. I tried it to solve it but I failed to solve it. So I requesting the you all to tell me my faults in this.

Thank you.

Gordon
  • 312,688
  • 75
  • 539
  • 559
Parag Chaure
  • 2,883
  • 2
  • 19
  • 27
  • Have a look at [this question](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags). Use an HTML parser to convert the document into DOM and then use DOM traversal methods or XPath. – Felix Kling Oct 17 '11 at 07:36
  • Have you initialized `$title` to an empty array? Have you turned on error reporting (`error_reporting(E_ALL | E_NOTICE)`)? – knittl Oct 17 '11 at 07:39
  • The problem must be that your regex isn't matching for one site - Can you post the snippet of HTML that isn't matching which includes the tag for that page? – nickb Oct 17 '11 at 07:39
  • Parsing HTML with regexen is brittle. Don't do it except if you have absolute control over the remote page AND only for elements which can't nest recursively. – nalply Oct 17 '11 at 07:45
  • possible duplicate of [Grabbing title of a website using DOM](http://stackoverflow.com/questions/5869925/grabbing-title-of-a-website-using-dom) – Gordon Oct 17 '11 at 07:56

2 Answers2

2

If you are matching HTML, then you should also expect uppercase tags. Add the |i flag therefore.

The title tag might also contain newlines, which is why the |s flag should also be present.

 preg_match_all('|<title>(.*)</title>|Uis', ...
mario
  • 144,265
  • 20
  • 237
  • 291
0

Maybe there is a line break in the title tag such like this:

<title>
The page title
</title>

Try

preg_match_all('|<title>[:space:]*(.*)[:space:]*</title>|U', $valD, $title);

instead.

Alex
  • 32,506
  • 16
  • 106
  • 171