0

I have a Django app that needs to monitor an email account to which users send emails to. For example, it will save the subject of the email to the database if the sender is already registered. The real application is more complicated than this example, so I want to set up a celery task to process emails in the background, in a distributed way. I already have a celery task to send emails asynchronously, but there are a few issues I'd like to have your opinions about processing received emails:

1) is there a way for postfix (or something else similar) to PUSH new emails to a python script? If not, I have to poll the inbox periodically from celery, right?

2) to make sure each email is processed by one and only one worker, which of the following is better?

  • to have a single task poll the inbox then distribute the jobs to multiple workers to process (e.g. each worker receives N emails)

  • to have multiple workers poll the inbox, and each one getting some of the emails

3) for polling the email inbox, since I can access the email server directly, I assume polling the files is more efficient than polling through IMAP. is there any downside of doing this?

my current scheme (planned but not implemented/tested yet, so it's just a rough idea...)

  • have N workers (say N=10) poll the inbox (by files)
  • have a function to calculate a hash value for each email
  • worker m gets the email if: (its hash % N) = m. this way each email is processed by one and only one worker, but the problem is if a worker is down, some of the emails will never be processed! How to overcome this?

Thank you for letting me know your opinions!

Z. Lin
  • 1,422
  • 3
  • 12
  • 16

1 Answers1

2

Postfix is definitely the way to go. See the following questions and answers for ideas:

https://serverfault.com/questions/206477/processing-incoming-emails-with-python

Postfix - How to process incoming emails?

Community
  • 1
  • 1
Rob Osborne
  • 4,897
  • 4
  • 32
  • 43