0

I am trying to index a website and my preg_match returns an empty array.

This is what I have so far:

$content = get_content("www.something.com");
preg_match_all('#<span class="box_cod">Cod: ([0-9\.]*)</span><span class="box_pret">PRET: (.*)</span>#',$content,$Produs);

Where get_content is a curl function to retrieve the site.

Thank you!

CharlesB
  • 86,532
  • 28
  • 194
  • 218
  • 2
    It's very difficult to parse HTML with regular expressions. Have you considered using a real DOM parser? – Álvaro González Mar 04 '13 at 12:38
  • Excellent... another opportunity to tell someone about [Tony The Pony](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454)! I'll never tire of this. – SDC Mar 04 '13 at 12:49

1 Answers1

3

You may Use PHP Simple HTML DOM Parser to parse and get the site content in a variable.
For example first you include the php file..

// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');

its easy than parsing HTML with regular expressions.

Max Muller
  • 533
  • 7
  • 18