0

Possible Duplicate:
How to parse and process HTML with PHP?

I am working on a script that is supposed to scrap the html of a page and find the "Contact" or "Contact Us" url on the page. So what I have is the url and I was able to get the html of the page using curl.

Now all i need to do is find the contact link and try to extract the email address and the phone number.

My question is how do I fond the contact url? What should I look for? Maybe the link text should have the word contact in it or maybe the url should have the word contact? What would the regex look like for that?

And second I think once i have the contact page i should be able to find a regex online that will extract the email address and the phone number. So it's i just need to find the contact link. These pages i am scraping are blogs.

Community
  • 1
  • 1
gprime
  • 2,283
  • 7
  • 38
  • 50
  • Perhaps you could use [Goutte](https://github.com/fabpot/Goutte), which is a PHP library for web scraping. Certainly a better idea than the insanity of trying to use regex to parse HTML. – SDC Nov 29 '12 at 15:34

2 Answers2

1

To find the contact page URL, I think you'll be better with an XML parser to "scan" the DOM (for example : tags).

If you know jQuery, you can use phpQuery, a PHP HTML parser that mimics the jQuery selector.

Basically, parsing HTML using Regex is generally a bad idea, see Parsing Html The Cthulhu Way

Vince
  • 3,274
  • 2
  • 26
  • 28
0

You can Curl that contact page and then just preg_match $result = curl_exec($resource)

preg_match_all("/[\._a-zA-Z0-9-]+@[\._a-zA-Z0-9-]+/i", $result, $matches);
  print_r($matches[0]);
pregmatch
  • 2,629
  • 6
  • 31
  • 68