I am trying to remove a section from some HTML. Here is an example of what I am working with (some of the specific div id's might change, but the idea is here):
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
<div dir="ltr">---------- Forwarded message ---------<br>data data data<br></div><br><br>
<div id="itemID" style="margin:0px"><div style="margin:0px">
<html i want to keep etc>
I want to transform this so it looks like:
<div id="itemID" style="margin:0px"><div style="margin:0px">
<html i want to keep etc>
And as another example, this HTML:
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
<div dir="headerline">---------- Forwarded message ---------<br>data data data<br></div><br><br>
<div id="itemID2" style="margin:10px"><div style="margin:10px">
<html i want to keep etc>
Should be transformed to look like this:
<div id="itemID2" style="margin:10px"><div style="margin:10px">
<html i want to keep etc>
In other words, look for Forwarded message
in the first or second line and, if you find it, delete every line up to and including that one. Right now the working RegEx looks like this:
var HTMLbodynew = HTMLbody.replace(/\n.+Forwarded Message.+\n/,"");
However, as described in a notorious Stackoverflow post, I shouldn't be using regex to parse HTML. Is there a way to accomplish this without regex?