0

I'm working on a NLP project for classifying email in Python. The main goal is to build a model that automatically redirect mails to the good service. I try to build a database with only the customers text mail and their demand. I started to load the emails on the pop server with poplib and it works good.

I'm looking for a solution to decode any mail whatever the encoding. I'm really not expert with encodings and I use a code that doesn't always work, I can't figure out why .. I remark that it doesn't work on old messages, probably they are archived in one more different encoding. I need a method that can detect and decode systematically, I searched on the web for two days and found nothing! Only website which propose to do it but I would like to integrate it directly in my code. I only need the body of the mail.

Does such a package exist? And if yes, which ?

Thanks a lot for reading me

  • iirc there is a few packages that try to do this(chardet I think is the one i used) ... they get it wrong sometimes if i recall my pain with this ... – Joran Beasley Jan 27 '21 at 18:13
  • You may have a look at [Python - Auto Detect Email Content Encoding](https://stackoverflow.com/questions/39235436/python-auto-detect-email-content-encoding) – AcK Jan 27 '21 at 18:21

0 Answers0