-1

I 'm totally stuck ... My level in javascript is not good enough to get me out of this problem I tried using a regular expression javascript to catch the words of a text excluding html tags on a text in French Here is my current regex

([^\r\n\t\f>< /]+(?!>))\b

The problem is that my accents are excluded and grabs my regex tags that it should not ( eg br html tag )

Here is a direct link to the test https://regex101.com/r/oT9uC1/10

my goal is to replace all word with span html tag thank you all in advance for your help

gagogago
  • 1
  • 1

2 Answers2

3

Don't use regular expressions to parse HTML; it will always fail at some point.

Instead use the DOM API for this, which knows HTML better than anyone else:

var span = document.createElement('span');
span.innerHTML = html;
var text = span.textContent;
var words = text.split(/\s+/);
console.log(words);

To wrap each word in a span tag, you can continue like this:

html = words.map(function (word) { 
    span.textContent = word;
    return '<span>' + span.textContent + '</span>';
});
trincot
  • 317,000
  • 35
  • 244
  • 286
  • my goal is to replace all word with span html tag in a second step , it is possible ? – gagogago May 31 '16 at 16:29
  • Yes, that is possible, but if you have a question about how to wrap words in a tag, then that really is a different question. Still I added something in my answer. May I suggest you look also at earlier questions about that (like [here](http://stackoverflow.com/questions/8609170/how-to-wrap-each-word-of-an-element-in-a-span-tag)) or post a new question? – trincot May 31 '16 at 18:28
  • Did this answer your question? Could you let me know? – trincot Jun 06 '16 at 07:30
1

Here's how I'd turn every word in an element into a span, I'd avoid regex, since DOM tools are provided by default.

var elementWithWords = document.getElementById('myElementId');  //get a reference to your element
var words = elementWithWords.textContent.split(/\s/);  //split on whitespace to get individual words
elementWithWords.textContent= '';  //clear out the html of the element
for(var i = 0; i < words.length; i++) { //for each word, create a span and append it to the original element
  var word = words[i];
  var wordSpan = document.createElement('span');
  wordSpan.textContent = word;
  elementWithWords.appendChild(wordSpan);
}

EDIT: You could probably use the first with some finangling, however, below should work and keep your formatting. Note, anytime you set the innerHTML of something, be aware that it potentially opens you up to Cross Site Scripting Attacks.

var elementWithWords = document.getElementById('myElementId');  //get a reference to your element
var words = elementWithWords.textContent.split(/\s/);  //split on whitespace to get individual words

for(var i = 0; i < words.length; i++) { //for each word, create a span and append it to the original element
  var word = words[i];
  elementWithWords.innerHTML = elementWithWords.innerHTML.replace(word, "<span>" + word + "</span>");
}
SethWhite
  • 1,891
  • 18
  • 24