0

Hi I am using the Gmail API to grab a message and extract its body content. I can successfully do that and am able to decode it to text. I've provided the code below. However, the issue I'm running into is the text that is decoded occasionally comes with a bulk of CSS code that I don't want.

Is there a way to get rid of any HTML and CSS code that may come in the decoded text? Thanks

gapi.client.gmail.users.messages.get({
                                                                     'userId': userId,
                                                                     'id': message.id
                                                                     }).then(function(response) {
var payload = response.result.payload;
var Body_obj = (payload.parts) ? payload.parts[0].body.data : payload.body.data;
                                                                             
console.log( atob(Body_obj.replace(/-/g, '+').replace(/_/g, '/')) );

Here's an example of all the CSS junk I get with the decoded message text:

p{            margin:10px 0;            padding:0;            }            table{            border-collapse:collapse;            }            h1,h2,h3,h4,h5,h6{            display:block;            margin:0;            padding:0;            }            img,a img{            border:0;            height:auto;            outline:none;            text-decoration:none;            }            body,#bodyTable,#bodyCell{            height:100%;            margin:0;            padding:0;            width:100%;            }            .hrdPreviewText{            display:none !important;            }  (so much CSS I do not want...) .hrdTextContent li{            font-size:14px !important;            line-height:150% !important;            }            }

Now some real text I want.                                                                                                                          

  • I've also tried striptags but that did not fully remove the CSS part form the text. – gmail_api_question Aug 22 '20 at 06:33
  • Load the HTML into a DOM parser and iterate over all text nodes. – CherryDT Aug 22 '20 at 11:54
  • sounds like you are decoding the html part and not the plain part of your message. You should inspect the contents of your message, you can do it from the UI by clicking on"three dots" and show original (including those contents into your question would help troubleshooting). [Try this API](https://developers.google.com/gmail/api/reference/rest/v1/users.messages/get) is also useful for testing. – ziganotschka Aug 24 '20 at 11:18
  • @CherryDT Not really helpful as the entire decoded string is treated as one text element when I try using a DOM parser. – gmail_api_question Aug 24 '20 at 17:17
  • I don't understand what you mean exactly... You should parse the original HTML and not that already badly "decoded" string (since that is lacking the semantics to know what's what) – CherryDT Aug 24 '20 at 17:18
  • @ziganotschka I double-checked and I can confirm that I am decoding the plain part of the message. When I check the message part type it is "text/plain". The other part is 'text/html" and that part contains only HTML/CSS code text. The plain text part in my original question contains BOTH the plain text I want and the CSS code that I do not want. I will try to upload the base64 encoded message content for further reference – gmail_api_question Aug 24 '20 at 17:19
  • You need the part that has the MIME type `text/html`: https://stackoverflow.com/a/24433196/1871033 – CherryDT Aug 24 '20 at 17:20
  • @CherryDT so I should decode the part that has the html MIME type and try to see if there is any text nodes/text content after I input the decoded part in a DOM parser? – gmail_api_question Aug 24 '20 at 17:21
  • What I was trying to explain is that apparently you get already badly parsed "plain" text in the first place (possibly even garbled by the email sender who nowadays didn't think about text-only clients anymore), so instead you have to get the HTML (if available) and _properly_ parse it, doing a better job. – CherryDT Aug 24 '20 at 17:22
  • @CherryDT Thanks I checked out the HTML part and there appears to be potential for better luck there – gmail_api_question Aug 24 '20 at 17:28

0 Answers0