0

I want to get content (with all css, links working and so on) from specific web page part, which are in <div id="some-content"></div>

preg_match("/<div id=\'some-content\'>(.*)<\/div>/m", file_get_contents('www.xxx.com'), $output);
print_r ($output);

But it returns empty array: Array ( )

What is wrong? Is it problem with preg_match or with web page?

Lina
  • 627
  • 4
  • 18
  • 35
  • In case you really want to work with xxx.com (a porn site btw), the simple answer is that there is no div with an id "some-content", hence you get an empty array. In case, it was meant as an example, you are encouraged to change the URL to example.com (which is the official URL for any URL examples). Even then though, the answer is likely there is no such div or it doesnt occur in that exact syntax. You can more reliably scrape HTML with an HTML Parser. See http://stackoverflow.com/questions/3577641/how-to-parse-and-process-html-with-php – Gordon Feb 23 '12 at 13:37
  • There is no way to account properly for all possible contents, you should use HTML parser, eg. [PHP FAQ](http://stackoverflow.com/questions/3577641/how-to-parse-and-process-html-with-php) – scibuff Feb 23 '12 at 12:50

1 Answers1

3

Use DOM Parsers. It has been said innumerate times regex are not powerful enough for parsing HTML.

php's built in DOM Parser.This is a decent DOM Parser for php. Read this thread (surely) on SO Legendary Catalogue.

Community
  • 1
  • 1
check123
  • 1,989
  • 2
  • 22
  • 28
  • 1
    This does not answer the question. If you are only going to suggest to use a DOM parser, provide it as a comment and point the OP to the canonical at http://stackoverflow.com/questions/3577641/how-to-parse-and-process-html-with-php there – Gordon Feb 23 '12 at 13:21
  • On a sidenote, PHP's PCRE Regex can very much handle HTML. It's usually not the language but rather the developer who is not up to the job. "That Answer" on SO everyone links to is wrong. – Gordon Feb 23 '12 at 14:13