1

i need to get two things out of an html file:

  1. text between <title> and </title>
  2. text between <body> and </body>

does anybody know how to do this? this is what i have so far:

$contents = file_get_contents($_GET['file']);
$title = preg_replace("/.*<title[^>]*>|<\/title>.*/si", "", $file);
$body = preg_replace("/.*<body[^>]*>|<\/body>.*/si", "", $file);

i need to echo the title in a textbox and the body in a textarea.

Josh Darnell
  • 11,304
  • 9
  • 38
  • 66
Tanner Ottinger
  • 2,970
  • 4
  • 22
  • 28
  • *(related)* [Best Methods to parse HTML](http://stackoverflow.com/questions/3577641/best-methods-to-parse-html/3577662#3577662) – Gordon Dec 15 '10 at 20:15
  • Read [Parsing Html The Cthulhu Way](http://www.codinghorror.com/blog/2009/11/parsing-html-the-cthulhu-way.html) – AlexV Dec 15 '10 at 20:33

1 Answers1

5

Do not use regex to parse HTML. See this answer. Instead, use DOMDocument::LoadHTML.

Community
  • 1
  • 1
asthasr
  • 9,125
  • 1
  • 29
  • 43