I'm trying to figure out that myself right now and one thing I can tell you is that you should definitely use email.header.Header to detect header injections:
from email.header import Header
>>> Header('Test').encode()
'Test'
>>> Header('Test\n').encode()
'Test'
>>> Header('Test\nTest2').encode()
'Test\nTest2'
>>> Header('Test\nFrom').encode()
'Test\nFrom'
>>> Header('Test\nFrom:').encode()
(...)
HeaderParseError: header value appears to contain an embedded header: 'Test\nFrom:'
Also check this answer, I think I agree that if the input is potentially dangerous you should just reject it as that probably means someone is trying to do something sketchy.
EDIT:
It turns out that MIME messages validate headers on their own, even if you don't use email.header.Header
, and also nicely encode the body:
>>> msg = MIMEText('something\r\nsomething2', 'plain', 'UTF-8')
>>> msg.as_string()
'MIME-Version: 1.0\nContent-Type: text/plain; charset="utf-8"\nContent-Transfer-Encoding: base64\n\nc29tZXRoaW5nDQpzb21ldGhpbmcy\n'
>>> msg['From'] = 'me@localhost\r\nSubject: injected subject'
>>> msg.as_string()
(...)
HeaderParseError: header value appears to contain an embedded header: 'me@localhost\nSubject: injected subject'
You can find more possible injections at Is there any injection vulnerability in the body of an email?.
So I would say that you don't have to do anything special to stay on the safe side as:
- header injection is detected by default, so a hacker cannot add headers by messing with the value of
subject
- body is encoded/quoted so if there's any evil sequence of charatecter that could break something it should be neutralized
- even if body wasn't encoded, I think the only way to cause harm would be to change the message structure by e.g. injecting MIME boundary (for multipart messages; example: https://bugzilla.mozilla.org/show_bug.cgi?id=600464). But the boundary is a long random string so it would probably be easier to guess your bank password.
<CRLF>.<CRLF>
sequence terminates message body but that is not a problem as MIME classes replace CRLF with LF:
>>> MIMEText('something\r\n.\r\nsomething2', 'plain', _charset='iso-8859-1').as_string()
'Content-Type: text/plain; charset="iso-8859-1"\nMIME-Version: 1.0\nContent-Transfer-Encoding: quoted-printable\n\nsomething\n.\nsomething2'