0

Hi I have the this html:

<div class="c-disruption-item c-disruption-item--line"> 
 <h3 class="c-disruption-item__title" id="11e62827-9f9c-48b2-8807-09f6b6ebeec6" name="11e62827-9f9c-48b2-8807-09f6b6ebeec6"> <a>Closure of London Road</a> </h3> 
 <ul class="c-disruption__affected-entities"> 
  <li>Affected routes:</li> 
  <li> <a href="/services/RB/X4#disruptions" class="line-block" style="background-color: #A38142; color:#FFFFFF"> 
    <div class="line-block__contents">
      X4 
    </div> </a> </li> 
 </ul>
 <p>The left turn from Wiltshire Road on to London Road will be closed between 10.00pm and 5.00am on the nights of 27/28 and 28/29 April 2020.<br> <br> Lion X4 affected as follows:-<br> <br> Journeys towards Bracknell will be diverted and unable to serve the Seaford Road bus stop. Please use the Three Frogs bus stop instead.<br> <br> Journeys towards Reading are not affected and should follow normal route.<br> <br> We are sorry for the inconvenience caused.</p> 
</div>

And I want to select whatever comes before and after the <ul></ul> section meaning not this:

   <ul class="c-disruption__affected-entities"> 
      <li>Affected routes:</li> 
      <li> <a href="/services/RB/X4#disruptions" class="line-block" style="background-color: #A38142; color:#FFFFFF"> 
        <div class="line-block__contents">
          X4 
        </div> </a> </li> 
     </ul>

But! if this section does not exist i want to select all.

I tried this selection ([\W\w]+(?=\<ul)|(?<=ul>)[\W\w]+) but it doesn't work if the <ul><\ul> not exist. The selection have to be regax alone. Does somebody have an idea?

thanks

zer00ne
  • 41,936
  • 6
  • 41
  • 68

1 Answers1

1

Regex is the last resort (at least when using JavaScript). Your objective is done by traversing the DOM not scanning a huge string trying to match error prone patterns.

Finding an unordered list with the className of ".c-disruption__affected-entities" and then excluding said <ul>.

Regex

String is the only data type regex is equipped to deal with. So all of the HTML (which is much more than just string) needs to be converted into a string.

let htmlString = document.body.innerHTML;

Valid HTML may use double and single quotes, multiple white-spaces may occur, multiple empty-lines, etc. A regex must be written to be able to handle such inconsistencies or written to target a pattern so specific that its usefulness outside of that particular situation makes it worthless. The htmlString will most likely be a hot mess of thickly layered HTML sporting huge attribute values like: "c-disruption-item c-disruption-item--line" Anyways, here's a statement using the regex method .replace(). It's untested because it's not efficient, nor practical to use, a complete waste of time:

let result = htmlString.replace(/<ul\s[\s\S]*c-disruption__affected-entities[\s\S]*ul>/i, '');   

DOM

A value like this: ul.c-disruption__affected-entities has more meaning as HTML, and is accessible as a DOM Object several standard ways. The following demo features a function that easily meets OP's objective.

Demo

Note: Details are commented in demo.

/**
 * Create a documentFragment and move the excluded node
 * (or nodes if it has descendants) to it. Although the
 * excluded node is no longer part of the DOM, a 
 * documentFragment allows any of its descendant nodes to
 * reattach to the DOM however and whenever.
 ***
 * @param {String} selector -- A CSS selector string of a
 *                             tag that needs to be 
 *                             returned without the
 *                             excluded tag.
 *        {String} exclusion - A CSS selector string of the
 *                             tag that needs to be
 *                             removed from the returned                           
 *                             value.
 */
const excludeNode = (selector, exclusion) => {
  const frag = document.createDocumentFragment();
  const area = document.querySelector(selector);
  const excl = area.querySelector(exclusion);
  frag.appendChild(excl);
  return area.outerHTML;
};

console.log(excludeNode(".c-disruption-item.c-disruption-item--line", ".c-disruption__affected-entities"));
:root {
  overflow-y: scroll;
  height: 200vh
}
<div class="c-disruption-item c-disruption-item--line">
  <h3 class="c-disruption-item__title" id="11e62827-9f9c-48b2-8807-09f6b6ebeec6" name="11e62827-9f9c-48b2-8807-09f6b6ebeec6"> <a>Closure of London Road</a> </h3>
  <ul class="c-disruption__affected-entities">
    <li>Affected routes:</li>
    <li>
      <a href="/services/RB/X4#disruptions" class="line-block" style="background-color: #A38142; color:#FFFFFF">
        <div class="line-block__contents">
          X4
        </div>
      </a>
    </li>
  </ul>
  <p>The left turn from Wiltshire Road on to London Road will be closed between 10.00pm and 5.00am on the nights of 27/28 and 28/29 April 2020.<br> <br> Lion X4 affected as follows:-<br> <br> Journeys towards Bracknell will be diverted and unable to serve
    the Seaford Road bus stop. Please use the Three Frogs bus stop instead.<br> <br> Journeys towards Reading are not affected and should follow normal route.<br> <br> We are sorry for the inconvenience caused.</p>
</div>
Community
  • 1
  • 1
zer00ne
  • 41,936
  • 6
  • 41
  • 68