5

I am working on an application which connects to the mail server using python POP3 library parses the emails and put them into database.

I have successfully parse the text emails, html emails and attachments. Now, I am stuck with the emails which contain embedded images with the emails. Server is howing CID: some code for the images in the src tag and the image is in the bytes. I am not sure how to get the images and map them with the CIDs.

Please suggest.

Thanks in advance.

below is the email content which I am getting:

Content-Type: multipart/alternative; 
               boundary="PHP-alt-e0af773d09fadf5208f69aecffcb4de888824263"

 --PHP-alt-e0af773d09fadf5208f69aecffcb4de888824263
 Content-Type: text/plain

 Hi, testing embedded images email!


 --PHP-alt-e0af773d09fadf5208f69aecffcb4de888824263
 Content-Type: multipart/related; boundary="PHP-related-e0af773d09fadf5208f69aecffcb4de888824263"

 --PHP-alt-e0af773d09fadf5208f69aecffcb4de888824263
 Content-Type: text/html

 <html>
 <head>
 <title>Test HTML Mail</title>
 </head>
 <body>
 <font color='red'>Hai, it is me!</font>
 Here is my picture: 
  <img src="cid:PHP-CID-e0af773d09fadf5208f69aecffcb4de888824263" />
 </body>
 </html>

 --PHP-related-e0af773d09fadf5208f69aecffcb4de888824263
 Content-Type: image/gif
 Content-Transfer-Encoding: base64
 Content-ID: <PHP-CID-e0af773d09fadf5208f69aecffcb4de888824263> 

 iVBORw0KGgoAAAANSUhEUgAAAEYAAAAgCAMAAACYXf7xAAAAGXRFWHRTb2Z0d2FyZQBBZG9iZSBJ
bWFnZVJlYWR5ccllPAAAAwBQTFRF////oNKWY6ZZTnc08/304+P/6/PsRHgpZYpWGHcTWqFWe7pz
WZNFwNa+Q2UqgpZ5JGcZ4ezj7e3/6Oj/tbW62tr/aadiK1sSUHQ6oKeSI0UM5PHkAAAAaZhifHx6
yMjKWHdJY5lbi6yFW5RU0+LSnq2VmZ6Mm8iS8vL/dXVzRERFJVUJrNalcrNtkZGRLnYslsWJ3e3d
7fXwstirWYJB3ergyeTI9vb/iIiIgoKBd6V0np6ce51rU2pDqMqlVVVWTnpFhcN7NTU2RYUqpbWd
rKysOHcn5vbql6eOMWYbMkUi+fn/uOStk6yLZGRm7f7tlLGKOXg20dvNIiIiGUUER4Q0InMcaYtf
3+/e3d3czd7KjY2Nnb6WtdOzKWkmhoaGUJNNjL+FhLt7jLp9IF0Z/v7/0tLRqrijVX9UTmZA+v38
Qko5SW5EVYA9JkwPMzwocnJub7RnfZpy3vPcaGhkhYWDbm5rhISIRoZGN0gxm6aQ/Pz/OYAyXm1V
pKSpeHh2Q1M5oqKgiaZ+dZ1vbqRaTVU4k7GFe6xqpr6c1+rb3uTcfcdx0d3Qk7ePhaJ6cqVsTp5H
xNzA1ezTVotS7e7uv968+v76xtPBPlczm7OVydfDdK1t+fn7+vT91NTddpRpVmNBlLyUgKRymZmW
u9a5dati9vr35eXugrFzTVY2/v//R5M5ial+zdbJcJJn8/jz+f73SV89EREReL1vob2TUVw7orGX
YmtU///+YYZNkaKGmdKUR106iIiD9/b5VWxNmbWOudy0j4+N+//9/v/8Dw8Pd5xnf3+INF8Yjp2D
frZ2cHB30ufZb3Bt2+HY3e3WqKqiLjcrUW09q8+xLmowOXAhmbiI4+Xnjr6P5O/n5/DkeK9mQEBE
8vf5//r/9fT4U5Q9hcqGlNKNDh0FlJSXA0UAC1cJGl0KWaZQwc69yN3K/f76drVuQn0iLTkZeJds
lq+Pv9HBN1YtV21Fkb6Bkb6KmLSHtNC5t9y5DikEhLZ/W3BLMEoddqVi4vfk////U8M4kgAAAQB0
Ankit Jaiswal
  • 22,859
  • 5
  • 41
  • 64

4 Answers4

2

I assume you are using Python's email package? It should handle images just fine. If you need to decode the image yourself, you need to have a look at the encoding, in this case base64. There is a module for encoding and decoding base64 in the stdlib, too.

As for the mapping, just get the Content-Id header from the images, create a dict that maps content ids to mime parts. To resolve the URLs in src, check if they start with 'cid:' (i.e. resolve to an internal mime document), strip off the prefix and look them up in the dictionary you created before.

Community
  • 1
  • 1
Torsten Marek
  • 83,780
  • 21
  • 91
  • 98
  • thanks for your reply. Yes, I am using email package. I do not have issues in decoding and reading the base64 content. I have done that for attachments. The issue is in parsing and mapping the content with cid part. – Ankit Jaiswal Dec 02 '10 at 07:12
  • What do you mean by parsing? The content ID does not carry any meaning apart from identy, it's just chosen to be unique within the document. – Torsten Marek Dec 02 '10 at 07:21
  • parsing means my code is working fine for all types of emails except the emails with inline or embedded images. It shows all the contents I posted in the question into the body of the email. – Ankit Jaiswal Dec 02 '10 at 15:34
2

Fixed the issue by checking the Content-Disposition value and cid in the contents.

If its attachment the file contents should be shown as attachments with the email and if its inline the contents will be shown in the body.

Ankit Jaiswal
  • 22,859
  • 5
  • 41
  • 64
  • Here is an example. *In case of inline attachment* (or embedded image) the headers will be: ``` 'Content-Disposition', 'inline; filename="1.png"' 'Content-ID', '<178eefca98b2c91aec1>' ``` and the `img` tag will be like: ``` ``` **In case of actual attachment** the headers will be like: ``` 'Content-ID', '<178eefca98bee445dfe2>' 'Content-Disposition', 'attachment; filename="Ajay_pratap_devops.pdf"' ``` – Abhishek Gupta Apr 20 '21 at 12:12
0

I copy/paste this email content. Even my formail client can't decode this mail correctly. So maybe this mail cotent is not correct or complete.

Tony
  • 54
  • 2
0

This can be easily done using attachment payload headers and img tag.

Here is an example.

In case of inline attachment (or embedded image) the headers will be:

'Content-Disposition', 'inline; filename="1.png"'
'Content-ID', '178eefca98b2c91aec1'

and the img tag will be like:

<img height="59" src="cid:178eefca98b2c91aec1" width="169"/>

In case of actual attachment the headers will be like:

'Content-ID', '178eefca98bee445dfe2'
'Content-Disposition', 'attachment; filename="Ajay_pratap_devops.pdf"'
Abhishek Gupta
  • 6,465
  • 10
  • 50
  • 82