1

I am new to PHP and trying to develop a system that will catch people who tries to enter spamming sites in to a social website. (such as a comment or post in pinterest, blog site etc..)

Following are the methods I am using: when a user enters a text in to a post/comment filed, I will go through all the text and extract all the URLs in it. Then:

  1. Compare the title of the webpage to the body of that webpage, to see how many words in title filed are included in the body. Then give it a rank.
  2. Comparing meta-tags against the body of a webpage and seeing if meta-tags are included in the body of a webpage. Then give it a rank.
  3. Comparing anchor text to the body of that webpage
  4. Comparing keywords in a URL to the body of the webpage
  5. Checking to see if the webpage contain any porn words.
  6. Checking for blacklisted sites by comparing URLs against an online database.

Can you please tell me if there are any other methods I can use to determine if a user given URL is a spam or a marketing site? Any help would be greatly appreciated.

JJJ
  • 32,902
  • 20
  • 89
  • 102
Justin k
  • 1,104
  • 5
  • 22
  • 33

1 Answers1

2

This question doesn't actually seem to be php specific. But anyway...

Here is a similar post with some ideas

Detecting a (naughty or nice) URL or link in a text string

Also, scientific papers on the subject should probably be worth looking at. Here's one to get you started.

http://dl.acm.org/citation.cfm?id=2093493&dl=ACM&coll=DL&CFID=337935760&CFTOKEN=13189143

Community
  • 1
  • 1
inquam
  • 12,664
  • 15
  • 61
  • 101
  • Thanks a lot for your time and giving me all these informatin. last link really helped a lot. thank you. – Justin k Jun 16 '13 at 10:48