0

I am trying to work on a logic which removes   elements if it exist within a div element. I tried the logic of replacing the   element as follows:

(div.html().replace(/^\s* /m, ''));

But the problem with the scenario I am trying to handle is, I would want to remove the &nbsp elements which are existing only at the immediate children level of an element, that is suppose we have the following html content:

<div class="block">
  <table>
    <tr>
      <td>
       &nbsp; 1 
      </td>
    </tr>
    <tr>
      <td>
        2
      </td>
    </tr>
  </table>
  &nbsp;
</div>

Consider the div element with class name block as the parent element we are taking. Here I would like to look at only the children of this element and not within those child elements which means that the &nbsp; within element should not be taken into picture and only the &nbsp; at the end (right after table ends) is what I wish to remove. Could anyone think of a solution to solve this problem?

Thanks

  • 2
    _"Could anyone think of a solution to solve this problem?"_ - sure, stop messing around on HTML with regex, and go with DOM methods instead ... Loop over the child nodes of the div, and if they are a mere text node, check what it contains. (You might apply regex for _that_ part, replacing it in the content of the individual text node.) – CBroe Sep 16 '21 at 08:08
  • @CarstenLøvboAndersen As far as I can tell that's what OP is doing currently, which also removes the &nbsp inside the cell, which they don't want –  Sep 16 '21 at 08:15
  • You need this: https://developer.mozilla.org/en-US/docs/Web/API/Node/childNodes (jQuery is of no use here) A suitable regex to match textNodes against would be `/^((&nbsp)|\s)+$/` –  Sep 16 '21 at 08:16
  • Does this answer your question? [How do I select text nodes with jQuery?](https://stackoverflow.com/questions/298750/how-do-i-select-text-nodes-with-jquery) – freedomn-m Sep 16 '21 at 08:24

1 Answers1

0

Test each child node of the div and remove the node where the nodeName is '#text', and the result of checking the nodeValue for the character code is true.

const div = document.querySelector('.block');
const regex = /(\u00a0).+/;

function remove(div) {
  const nodes = div.childNodes;
  console.log(`Current nodes: ${nodes.length}`);
  nodes.forEach(node => {
    if (node.nodeName === '#text') {
      if (regex.test(node.nodeValue)) {
        div.removeChild(node);
      }
    }
  });
  console.log(`Remaining nodes: ${nodes.length}`);
}

remove(div);
<div class="block">
  <table>
    <tr>
      <td>
       &nbsp; 1 
      </td>
    </tr>
    <tr>
      <td>
        2
      </td>
    </tr>
  </table>
  &nbsp;
  &nbsp;table
</div>
Andy
  • 61,948
  • 13
  • 68
  • 95
  • Thanks for this response but when we do the folowing if check: if (/\u00a0/g.test(node.nodeValue)) I was able to notice that if the nodeValue at that point is something like " table", since we are using the test() to do the comparison, we are removing this node also from the list ignoring the text content. – Vaishnav Sivadas Sep 16 '21 at 11:59
  • Just adjust the regex. I've updated my answer. – Andy Sep 16 '21 at 12:20
  • I tried doing this but the catch I saw here is at times in the respone I get for the content, the whitespace data seems to come like this: \n\t\t\t rather than   which seems weird and due to this reason this whitespaces are not being handled :/ – Vaishnav Sivadas Sep 16 '21 at 13:24
  • Adding to the previous comment, the sequence also seems to be appended to the text data like I mentioned in the first comment: " \n\t\t\tsample text" – Vaishnav Sivadas Sep 16 '21 at 13:38