2

I have a process where about 1500 mails are sent once a week.

The process I have it in a django command that I plan to put in a crontab. The process has a loop in which it is verified if the user want to receive emails and the language in which will receive, like this one:

for user in users:
    # Check if user accept emails
    if user['send_mail']:
        # Get language to email
        lang = ""
        if user['lang'] == "es":
            lang = "es"
        elif user['lang'] == "fr":
            lang = "fr"
        else:
            lang = "en"

        email = user['email']

        # Send email
        send_mail()

It is not much, 1500 mails, but I want to leave it scalable, since the amount of mails depends on the number of registered users for the platform.

I do not know if it is now scalable or it is better to use redis queue or celery.

I am using Amazon Simple Emails Service (SES).

  • Please do not use a queue to send emails. Email systems are (by the very nature of the SMTP spec) a giant queue. There are 2 approaches to this: either use a local mailer daemon on this box that will relay the email out and handle retries, or use a system that provides you with a rest interface (like SES) that will retry for you. Celery/Redis is a bad tool for this job as SMTP requires a lot of a client from a retry perspective, and you don't want to code that yourself. – Matthew Story Aug 14 '18 at 14:27
  • 1
    @MatthewStory I fundamentally disagree, and I can't help feeling you've missed the point of the question. Yes emails are a kind of queue, but SES is an API and it takes time to access it; if you try and do 1500 API calls within a Django view, it will timeout. – Daniel Roseman Aug 14 '18 at 14:29
  • @DanielRoseman he's moving it to a crontab ... so why would you use celery? Also why do you fundamentally disagree? Emails aren't kind of a queue, they are literally a domain-specific priority queue with a very gnarly retry spec. – Matthew Story Aug 14 '18 at 14:30
  • @MatthewStory thanks for your answer. Can you explain a litte more this: "here are 2 approaches to this: either use a local mailer daemon on this box that will relay the email out and handle retries, or use a system that provides you with a rest interface (like SES) that will retry for you" I do not finish to understand if like this now it is ok or do I have to make a change to make it scalable?. Thanks! – Martin Peveri Aug 14 '18 at 14:31
  • In your position I would have my cronjob send the emails via SES @MartinPeveri. That will scale nicely for you. – Matthew Story Aug 14 '18 at 14:33
  • @MatthewStory, maybe you could elaborate why you discourage to use an async task queue like celery. It's literally written down in the book two scopes of Django to do so. – Yannic Hamann Aug 14 '18 at 14:34
  • I'm sorry for my ignorance, it's the first time I use SES. Now I'm using the SES API? I have my SMTP configured as any django project with the settings to my amazon SMTP and use send_mail function of django. It's ok? or I have to use this: https://docs.aws.amazon.com/es_en/ses/latest/APIReference/API_SendEmail.html – Martin Peveri Aug 14 '18 at 14:42
  • @MartinPeveri I would personally (and have in the past) just use the SES API (which is REST over HTTPS) rather than using the SMTP API. – Matthew Story Aug 14 '18 at 15:12
  • @MatthewStory What advantages does it have? It would force me to change a part of my code, I would like to know the advantages. – Martin Peveri Aug 14 '18 at 15:15
  • 1
    @YannicHamann I will try to compose an answer to this later today to cover it in more detail, but the short answer is that SMTP is by definition an async task queue which has very specific and complicated retry rules that the client must implement, so using celery is redundant and also likely to result in an incorrect client implementation. That's why I generally prefer to either use a local relay mailer, where local delivery is guaranteed (like nullmailer) or use a REST interface (like SendGrid or SES) from cron. – Matthew Story Aug 14 '18 at 15:15
  • @MartinPeveri HTTP is far simpler than SMTP as a protocol, which is the main benefit to using the REST api for simple tasks like this rather than the SMTP api. That being said, SES is very reliable from within AWS so you can typically assume successful delivery to SES (so there's not a practical downside to using the SMTP API here). This comment thread has become much bigger than I had initially anticipated, so I will try for a more complete answer this afternoon. – Matthew Story Aug 14 '18 at 15:18
  • @MatthewStory I understand that there is no big difference, but it's simpler HTTP. Thanks! – Martin Peveri Aug 14 '18 at 15:32

1 Answers1

1

You have two different issues to deal with here:

First, that while it is pretty easy to SEND 1500 emails, there are complex realities to whether those 1500 emails will be RECEIVED. Your email can easily be blocked or diverted to a spam folder. Your whole domain could be blocked by some mail services. To limit the possibility of these difficulties, you need to have DKIM and SPF records set up properly.. and there are other things that commercial mail senders do to keep things smooth. So if you are not interested in taking on that challenge you are better off working with a professional service like SES.

But sure, you can also just use postfix or any other mail relay software to set up your own mail server locally, even right on the same machine. Set up your own DNS records and send the mail directly to the recipent without SES or anyone else to deal with.. but then you have to deal with any spam blocker problems.

Second is that, assuming you use SES, you have to make sure that all your emails are safely delivered from yourself to Amazon. This is where trouble can come in. You don't want to generate half your emails and have them delivered, then due to let's say a network outage, have a problem.. and have no way of sending only those that were not sent without resending all. It can be a tricky bit of code to write perfectly.

The easiest solution technically is to install a local SMTP relay server (eg. postfix) configured with Amazon as its "smarthost". Configure django to use "localhost" as its SMTP server.

With that in place, when your cron job runs, it will only take a few seconds, because all the emails go straight into postfix's directories on your local drive and are queued there.

Then postfix, because it is configured with SES's SMTP server as its smarthost (sometimes called smart relay), won't send any email directly to the recipient, but will forward all the emails to SES to be delivered to the final recipient. If there's any problem doing that, postfix (or whatever mail relay software you prefer) will retry each message until things work out.

It's made for that, it's tried, tested, works...

So that is the easiest path for you.

If you choose to use the SES REST API, then it is the responsibility of your code to make sure that each message is delivered to Amazon exactly once and only once. If you loop through 1000 emails and then there is a network failure or crash and you fail to send the last 500 emails, it will be the problem of your code to recover from that without resending the first 1000 emails again. And for that, yes, queuing systems are useful. Celery or just RabbitMQ by itself can work. Or just make a queue by storing records in your database of what messages need to be sent then deleting those records as each email is sent.

But writing code like that which works perfectly in every circumstance can be tricky. Sometimes it is ok to re-invent the wheel.. sometimes you need a better wheel :) But in this case I think you are better off using an SMTP relay server.

little_birdie
  • 5,600
  • 3
  • 23
  • 28
  • relay server is almost always the way to go. – Matthew Story Aug 14 '18 at 16:35
  • Excellent, answer, I will investigate how to implement an SMTP relay server. What I do not understand is the following: as I use it now configuring SES SMTP in settings.py and using the send_mail function. Is it scalable or what problems can I have, in addition to losing emails? Can the server be saturated? – Martin Peveri Aug 14 '18 at 17:25
  • @MatthewStory I understood, what I do not understand is the problems that I can get to have, in how I have my configuration and the code now. – Martin Peveri Aug 14 '18 at 18:37
  • There are many many postings online about how to set up postfix to relay mail through SES. Here's one of them. http://www.tothenew.com/blog/configuring-server-to-relay-email-through-amazon-ses/ But I suggest you look at a few of them. Aside from making sure that postfix forwards all mail to SES, you need to also make sure that postfix does not expose its smtp to the outside world! To do this, make sure you have `inet_interfaces = 127.0.0.1` in your configuration! Very important. Else the whole world will be able to send spam through your SES account. – little_birdie Aug 14 '18 at 20:15
  • To test that your smtp server is not exposed, first try (on your server) `telnet localhost 25`.. you should get a greeting message from postfix. Then try the same thing using the external ip address of your server.. it should not connect. – little_birdie Aug 14 '18 at 20:17
  • To answer your question, with sending just 1500 emails you have very little worry about server capacity unless the emails are huge. Even a very slow machine with slow disk can handle that. Make sure you have a good amount of free space on your hard disk. The difference between configuring django to talk directly to SES's SMTP vs. having your own mail relay in the middle is that when you have the mail relay in the middle, if SES server is not reachable your mail will be queued locally and sent when it is possible. – little_birdie Aug 14 '18 at 23:06
  • @little_birdie That is, the main problem, with the configuration I have now, (instead of an SMTP relay server) is that I could lose emails, since they would not be queued. Is it correct? Would not I have performance problems if tomorrow I do not know that I have to send 10k or 20k of mail?. Thank for your answer. – Martin Peveri Aug 15 '18 at 00:42
  • Even 20K emails, while not a small number, is not a huge number either if your server hardware is reasonable. You can still do that with your simple loop and let postfix do the work. But the danger that you have there is that the longer it takes to generate your emails, the greater the chance that the process could be interrupted while you are doing so. In a simple setup like yours, there is always going to be a point at which, if you fail, you won't know what messages to retry. The only way to solve that would be to commit information to a flat file or database. – little_birdie Aug 15 '18 at 01:38
  • If you are planning to send a LOT of emails, and want a really bulletproof solution, then you are going to have to write something much more sophisticated. For your original problem of 1500 emails it is pretty simple. – little_birdie Aug 15 '18 at 01:40
  • Ah ok, i understood. Excellent answer, now is super clear. Thanks! – Martin Peveri Aug 15 '18 at 01:56