2

I am attempting to delete 300,000+ spam comments from a Django site that is using the Zinnia blogging app. Zinnia includes a command for deleting spam called, appropriately, spam_cleanup but running this command spews thousands of the following error before being terminated by the OS.

OperationalError: (1040, 'Too many connections')

The code for the spam_cleanup command is as follows:

class Command(NoArgsCommand):
"""
Command object for removing comments
marked as non-public and removed.
"""
help = "Delete the entries's comments marked as non-public and removed."

def handle_noargs(self, **options):
    verbosity = int(options.get('verbosity', 1))

    content_type = ContentType.objects.get_for_model(Entry)
    spams = comments.get_model().objects.filter(
       #is_public=False, is_removed=True,
        content_type=content_type)
    spams_count = spams.count()
    spams.delete()

    if verbosity:
        print('%i spam comments deleted.' % spams_count)

My initial thought was just to break the query down to only delete say 80 items at at time using the limit property but Django tells me that I can't do that on delete:

AssertionError: Cannot use 'limit' or 'offset' with delete.

It's not reasonable to increase the max connections on MySQL to 300,000, right? I also read that Django emulates cascade on delete but does not set it at the DB level so a raw SQL query could orphan all the relations. I am lost as to how to perform this delete properly, please help!

Shane
  • 851
  • 2
  • 6
  • 16
  • Are there many cascade deletes required? You can surely improve a little on the query to make it more efficient and not holding the large querset in memory – Anzel Jan 08 '15 at 23:37
  • Looking at the [Comments model](http://django-contrib-comments.readthedocs.org/en/latest/models.html) it looks like there should be no cascade deletes required. I was hoping to do this in Django but maybe a SQL query is the best way. – Shane Jan 09 '15 at 00:12
  • In this case, you may try using an unsafe api **_raw_delete()** which ignores **signals** and no protection to **cascade**. But wrapping everything in a single SQL query – Anzel Jan 09 '15 at 00:17
  • I think the operation error you have are more due to the default limitation from MySQL though, rather than Django side... – Anzel Jan 09 '15 at 00:18
  • Thanks Anzel, I ended up just executing the following which worked like a charm and ran in seconds: `with transaction.atomic(): cursor.execute("DELETE FROM django_comments WHERE content_type_id=20")` – Shane Jan 09 '15 at 00:38
  • yea I'm with raw SQL as well, much more efficient and direct :) – Anzel Jan 09 '15 at 00:53

0 Answers0