I am new to PHP and trying to develop a system that will catch people who tries to enter spamming sites in to a social website. (such as a comment or post in pinterest, blog site etc..)
Following are the methods I am using: when a user enters a text in to a post/comment filed, I will go through all the text and extract all the URLs in it. Then:
- Compare the title of the webpage to the body of that webpage, to see how many words in title filed are included in the body. Then give it a rank.
- Comparing meta-tags against the body of a webpage and seeing if meta-tags are included in the body of a webpage. Then give it a rank.
- Comparing anchor text to the body of that webpage
- Comparing keywords in a URL to the body of the webpage
- Checking to see if the webpage contain any porn words.
- Checking for blacklisted sites by comparing URLs against an online database.
Can you please tell me if there are any other methods I can use to determine if a user given URL is a spam or a marketing site? Any help would be greatly appreciated.