Replace a recurring string with part of it

Question

Let's say I have the following text: (example)

<table>
  <tr>
    <td>
      <span>col1</span>
    </td>
    <td>col2</td>
  </tr>
  <tr>
    <td>text1</td>
    <td>
      <span>text2</span>
    </td>
  </tr>
</table>

I want to replace all <span>%</span> by %, and I've come up with a solution like this:

replace(/<span>(.*)<\/span>/gi, function(full, text){return text;})

It replaces from the first span until the last one by only one occurrence, therefore the whole structure of my table is messed up.

How could I tell JS to replace each occurrence by the right one and not everything at once? The solution needs to be in Javascript obviously. I hope my example is not too "simple" and bugged to avoid any confusion.

obligatory http://stackoverflow.com/questions/590747/using-regular-expressions-to-parse-html-why-not — Mike Samuel, Feb 02 '12 at 17:22
it is possible to do certain operations on HTML using regular expressions, but you are very likely to get it wrong, and the resulting code will be very brittle -- even if correct, minor changes to requirements will require a complete rewrite. — Mike Samuel, Feb 02 '12 at 17:25

Mike Samuel · Accepted Answer · 2012-02-02T17:31:00.630

.* is greedy, so will happily match </span>...<span>. Replace it with [\s\S]*? which is non-greedy, but (unlike .) matches any character including newlines.

/<span>([\s\S]*?)<\/span>/gi

Better yet, parse it properly to a DOM and then change the spans there.

EDIT:

Rather than learn how to kind-of-parse HTML with regular expressions, your time would be better spent learning the DOM manipulation tools that are better suited to this problem.

To parse the HTML, you can do

var container = document.createElement('DIV');
container.innerHTML = myStringOfHTML;

Then

container.getElementsByTagName('SPAN')

will get all the SPANs.

Finding the ones that contain only a text node is simple:

var spans = container.getElementsByTagName('SPAN');
for (var i = 0, n = spans.length; i < n; ++i) {
  var span = spans[0];
  // do work here
}

to fold the children into the parent,

var spans = document.getElementsByTagName('SPAN');
for (var i = 0, n = spans.length; i < n; ++i) {
  var span = spans[0];
  while (span.firstChild) {
    span.parentNode.insertBefore(span, span.firstChild);
  }
  span.parentNode.removeChild(span);
}

I agree with using the DOM, but unfortunately, I cannot edit it as the current HTML page needs to stay the same to the user, but your remarque about the DOM, led me to this solution (I'm using jQuery): http://api.jquery.com/clone/ I'm just hoping "cloning" a part of the document won't slow down the browser. — Nicolas, Feb 02 '12 at 17:34
@Nicolas, Yeah. Sometimes performance necessitates complicated/brittle code, but I would try the simple solution first. — Mike Samuel, Feb 02 '12 at 17:47
Your solution worked perfectly, thank you. Still, I was curious about your DOM solution, and I think, although that is a more robust solution, is could come to a cost on the user side. — Nicolas, Feb 02 '12 at 17:54

score 0 · Answer 2 · answered Feb 02 '12 at 17:41

I get that HTML and regexes don't go well together generally, and @MikeSamuel has a good solution for using the DOM, but it is really simple to do with regex (in this case).

var text = '<td>Hello</td> <td><span>WORLD</span></td> <td>Begin</td> <td><span>AGAIN</span></td>';
text.replace(/<span>([\s\S]*?)<\/span>/gi, '$1');

-> "<td>Hello</td> <td>WORLD</td> <td>Begin</td> <td>AGAIN</td>"

Replace a recurring string with part of it

2 Answers2