0

I am trying to remove a section from some HTML. Here is an example of what I am working with (some of the specific div id's might change, but the idea is here):

Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir="ltr">---------- Forwarded message ---------<br>data data data<br></div><br><br>
<div id="itemID" style="margin:0px"><div style="margin:0px">
<html i want to keep etc>

I want to transform this so it looks like:

<div id="itemID" style="margin:0px"><div style="margin:0px">
<html i want to keep etc>

And as another example, this HTML:

Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir="headerline">---------- Forwarded message ---------<br>data data data<br></div><br><br>
<div id="itemID2" style="margin:10px"><div style="margin:10px">
<html i want to keep etc>

Should be transformed to look like this:

<div id="itemID2" style="margin:10px"><div style="margin:10px">
<html i want to keep etc>

In other words, look for Forwarded message in the first or second line and, if you find it, delete every line up to and including that one. Right now the working RegEx looks like this:

var HTMLbodynew = HTMLbody.replace(/\n.+Forwarded Message.+\n/,"");

However, as described in a notorious Stackoverflow post, I shouldn't be using regex to parse HTML. Is there a way to accomplish this without regex?

garson
  • 1,505
  • 3
  • 22
  • 56

1 Answers1

0

Try to use this

document.querySelector("div[dir='ltr']").remove();

for (const brElement of document.querySelectorAll("br")) {
    brElement.remove();
}
Anton Dikarev
  • 335
  • 3
  • 11
  • Thank you, but unfortunately not every HTML document this needs to work on has the "ltr" (also, some of the divs in the html I want to keep later have "ltr"). I have updated the question to try to make this more clear. – garson Mar 17 '21 at 13:49