3

I have a list of strings which should be wrapped by with some class in the HTML markup before or after appending to DOM (both options are acceptable). The issue is that the markup has a lot of "garbage" inside - formatting tags, styles, wrapping to another DOM elements, which should left after the replacement. See example below:

<custom-tag>Word4<span style='font-family:"Candara","sans-serif"'>Word1 Word2</span>Word3</custom-tag>
<custom-tag>Word1<span style='font-family:"Candara","sans-serif"'>Word2<br>Word1<b>Word6</b></span></custom-tag>

Given the list of ['Word1', 'Word4', 'd6'] I should receive as result:

<custom-tag><span class="replaced">Word4</span><span style='font-family:"Candara","sans-serif"'><span class="replaced">Word1</span> Word2</span>Word3</custom-tag>
<custom-tag><span class="replaced">Word1</span><span style='font-family:"Candara","sans-serif"'>Word2<br><span class="replaced">Word1</span><b>Wor<span class="replaced">d6</span></b></span></custom-tag>

So:

  1. Replace only plain strings, don't touch tags and styles
  2. Replace all strings from the list in each content (not only the first one)

I have started from the regex, and wrote the one which takes the matches the content with the one term from list.

<custom-tag>.*?|(Word1).*?<\/custom-tag>

Unfortunately I'm not an expert in regex, so I need a help. Ideally it should be 1 regex, which matches all strings from a list and excludes the tags and styles. Another option - write a script which uses the DOM API and make the same as described above. Thank you for any ideas.

kirill.buga
  • 1,129
  • 2
  • 12
  • 26
  • 1
    Petty and probably irrelevant question: Why is `
    ` a ***div***, which is a _block_ element? Almost by definition, the elements are replaced _inline_ which suggests they should be wrapped in a ``.
    – Stephen P Mar 22 '16 at 22:16
  • @StephenP :D that's why I tried to fix what you mentioned using CSS `inline`. Good comment. – Roko C. Buljan Mar 22 '16 at 22:18
  • @StephenP you absolutely have a point. Seems it was too late for asking a question :) Will amend the question. – kirill.buga Mar 23 '16 at 08:40

1 Answers1

2

var list = ['Word1', 'Word4', 'd6', 'red'];

var query = list.join("|");
// Uncomment if you cannot trust your `list` Array values
// query = query.replace(/[<>)(.]?/g, "");

var reg = new RegExp("(?![^<]+>)("+ query +")", "ig");

$("#source").html(function(i, html){
  return html.replace(reg, "<div class='replaced'>$1</div>");
});
.replaced{display:inline; background:gold;}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>
<div id="source">
  <custom-tag>Word4<span style='font-family:"Candara","sans-serif"; color:red;'>Word1 Word2</span>Word3</custom-tag>
  <custom-tag>Word1<span style='font-family:"Candara","sans-serif"'>Word2<br>Word1<b>Word6</b></span></custom-tag>
</div>

What the above does is basically this: Regex101.com Explained

The above prevents positive matches between < tags > therefore skips matches in attributes as-well; i.e: you can see that I've use a "red" string query, but even having in the source color:red; there was not a positive match - which would otherwise (logically) result in a total mess: an attribute text wrapped into a DIV :)

Disclaimer: Must read: RegEx match open tags except XHTML self-contained tags
If you feel cool after reading the above link and you don't have to be so picky about "several" issues mentioned there... You're good to go

Community
  • 1
  • 1
Roko C. Buljan
  • 196,159
  • 39
  • 305
  • 313
  • First of all thank you for your answer and the link to useful post. I'm really aware about the impact of parsing the html with regex but looks like my situation requires that at least on this stage (I know that everyone tries to explain that usage :D ) – kirill.buga Mar 23 '16 at 08:39