4

I have a large list of users that registered through a website without any spam filter active during registration.

I would like to distinguish which registered users are likely spammers. I'm trying to use akismet to do this but so far akismet is telling me all users are not spammers. Probably since akismet really is made for comments, which aren't available during registration.

What I'm sending akismet is the username, email. For url I use the email domain. For their comment, I use: "Hi, I'm $username from $domain registered on $date with email $email and website $url".

This however, like said, always returns valid users even if the user looks like a spammer.

If you're interested in the full code:

<?php

// bring php process to this dir
chdir(dirname(__FILE__));


// include Joomla Framework
require('../bootstrap-joomla.php');

// akismet class
require('akismet.class.php');

/**
 * Retrieves users not yet validated
 */
function getUsers($userid, $limit = 10) {
  global $database;
  $database->setQuery("SELECT * FROM jos_users WHERE akismet_validated = 0 LIMIT " . intval($limit));
  $Users = $database->loadObjectList();
  return $Users;
}

/**
 * sets the validation results for the user
 */
function saveValidationResult($userid, $spammer) {
  global $database;
  $database->setQuery("UPDATE jos_users set akismet_validated = 1, akismet_spammer = " . intval($spammer) . " WHERE id = " . $userid . " LIMIT 1");
  return $database->query();
}

// get non validated users
$Users = getUsers();

// validate each user
foreach($Users as $User) {
  list($user, $domain) = explode('@', $User->email);

  $name = $User->username;
  $email = $User->email;
  $url = $domain;
  $comment = "Hello, I am $name, registered on $User->registerDate from <a href=\"$url\">$url</a>.\r\n";


  $akismet = new Akismet('http://www.fijiwebdesign.com/', 'c511157d1d98');
  $akismet->setCommentAuthor($name);
  $akismet->setCommentAuthorEmail($email);
  $akismet->setCommentAuthorURL($url);
  $akismet->setCommentContent($comment);
  //$akismet->setPermalink('http://www.fijiwebddesign.com/');


  echo "$User->id, $User->username : ";
  if($akismet->isCommentSpam()) {
    saveValidationResult($User->id, true);
    echo "Spammer";
  } else {
    saveValidationResult($User->id, false);
    echo "Not Spammer";
  }

  echo "\r\n";
}
bucabay
  • 5,235
  • 2
  • 26
  • 37
  • you cant detect a spammer based on email address and domain, don't even bother. Register people with CAPHACHA on form, then you just have to see what they post. –  Mar 24 '11 at 03:10

3 Answers3

4

It's best to think of Akismet as a giant Bayesian spam filter with some other heuristics. It works on the contents of a post, the timing of a post, and most importantly, how frequently it's seen similar content that has been reported as spammy. The string you're feeding to it is somewhat unique, so others will not have educated it on spammyness. Even if you did somehow mark that string as spammy, you'd end up with a whole bunch of false positives because you're just feeding all of the user accounts through it.

If you believe that you may have illegitimate users on your site, and they have not participated, simply delete the registration. If they are legitimate, they can simply re-register.

If the users are participating, simply look at their contributions. Their spammyness should be obvious.

Charles
  • 50,943
  • 13
  • 104
  • 142
  • 1
    Thanks for the answer. What I had in mind is that I'm sure others have thought of using akismet for registration and thus it would have educated guesses. If it isn't used for that then I believe your answer is the closest. Probably with checking contribution is that many registered were for downloading products, not participating, thus that would weed out a very small portion. – bucabay Mar 25 '11 at 05:54
2

bucabay, use the contact form on Akismet.com to get in touch with us. We'll see if there's something we can do to help improve your results.

You can use Akismet to check signup registrations if it's done right. Accuracy isn't yet at the point where it's something we officially recommend, but we're working on improving it and you're welcome to experiment.

Captchas have their own set of problems. The major commercial spambots break them.

Alex
  • 31
  • 1
  • Any tips how how to generate the comment in a way that can give better results? I'm not really worried about accuracy, just need enough so that we can have humans sort out the rest. – bucabay Mar 25 '11 at 05:51
  • 4
    So did you get answer from them about how to compose content for registration form? – Meglio Jan 07 '12 at 10:49
-1

You are reinventing a wheel that has been done lots of times very successfully. Just use Recaptcha or one of the methods from here - Practical non-image based CAPTCHA approaches?

Community
  • 1
  • 1
Brent Friar
  • 10,588
  • 2
  • 20
  • 31
  • 1
    Thanks Brent for the anwser. I'm not looking for a prevention howewver, I need to clean an existing database of users. – bucabay Mar 25 '11 at 05:58
  • Ah, I see. I misunderstood the intent. How many users are you talking about? Do you have spam posts showing up on your site now? It's going to be hard to make Akismet work for your purpose because you are missing some key metrics they are measuring. – Brent Friar Mar 25 '11 at 11:17