I am a php newb but I am pretty sure this will be hard to accomplish and very server consuming. But I want to ask, get the opinion of much smarter users than myself.
Here is what I am trying to do:
I have a list of URL's, an array of URL's actually.
For each URL, I want to count the outgoing links - which DO NOT HAVE REL="nofollow" attribute - on that page.
So in a way, I'm afraid I'll have to make php load the page and preg match using regular expressions all the links?
Would this work if I'd had lets say 1000 links?
Here is what I am thinking, putting it in code:
$homepage = file_get_contents('http://www.site.com/');
$homepage = htmlentities($homepage);
// Do a preg_match for http:// and count the number of appearances:
$urls = preg_match();
// Do a preg_match for rel="nofollow" and count the nr of appearances:
$nofollow = preg_match();
// Do a preg_match for the number of "domain.com" appearances so we can subtract the website's internal links:
$internal_links = preg_match();
// Substract and get the final result:
$result = $urls - $nofollow - $internal_links;
Hope you can help, and if the idea is right maybe you can help me with the preg_match functions.