The objective of this regular expression is to find whether a web page contains backlink(s) to a given domain and that all of must have a rel="nofollow" attribute on a tag. True if it meets this otherwise False if any does not contain rel="nofollow".
From any web page I want to check whether anything like this is present:
<a ... href="http://www.mysite.com/xyz...." ... >
Addtionally there must not be "rel=nofollow" attribute in all such links found.
Given that domain www.mysite.com is known and I want to check it even within comments or wherever present in the page.
I could do above myself but I'm not able to think of optimized way to it using single pattern.
One unoptimized way I can do it to find all occurances of a tags with href="mysite.com" and see if even single match does not contain a rel=nofollow.
Is there any smart & single line way of making a regular expression pattern?
PS: Don't want to parse DOM since it's risky to miss a backlink due to parsing error and Google's DOM parser could be different. I want human attention to only those pages links from whom can cause backlink penalty from search engines. If a link within comment is flagged as backlink and takes away some human attention, no problem. But at any cost links from say a porn site must be caught. Finally I want to prepare list of spam links which I can submit in Google Webmaster's Disavow tool. This exercise is must for every webmaster once or so in a month for every site. And I can't afford this kind of paid service: www.linkdetox.com