0

I have a sentence which contains attributes : "hey how <span target="">you</span>"

I would like to put 'hey', 'how', '<span target="">you</span>' in an array.

I used string.split(' ') and got ['hey','how','<span','target="">you</span>']

Is it possible to split all the words and use at the same time the regex in order to catch the words which begin with < and finish with > ?

Thank you

Seabon
  • 241
  • 1
  • 3
  • 14
  • Parsing HTML with regex is [dangerous territory](http://stackoverflow.com/a/1732454/382456), why do you need this? – Scott Jan 12 '17 at 16:14
  • I have a sentence : "hello how are you" (example) and I have to extract each words in an array and sometimes there are some html tags... So what do you suggest if it is not appropriate ? – Seabon Jan 12 '17 at 16:56

4 Answers4

1

If level of nested HTML tags is zero then this would help:

console.log('hey how <span target="">you</span>'.match(/(?!<)\S+|<(\w+)\b[^]*?\/\1>/g));
revo
  • 47,783
  • 14
  • 74
  • 117
0

regex should not be used to parse html - RegEx match open tags except XHTML self-contained tags

maybe you should ues jQuery?

$("#txt").text()
Community
  • 1
  • 1
Max Paymar
  • 588
  • 1
  • 7
  • 23
0

This should do the trick, but as @Scott mentioned regex may not be the appropriate way to handle html depending on your data.

const regex = /\<.+?>.*?\<.+?>|\S+/g;
const str = `hey how <span target="">you</span>`;

console.log(str.match(regex));
Wagner DosAnjos
  • 6,304
  • 1
  • 15
  • 29
0

I would say don't parse html using Regex. Using jquery or javascript functions is much more easy and safe.

For,

<div id="sentence">
  hey how <span target="">you</span>
<div>

Do something on the lines of

$("#sentence").text() // JQuery

or

document.getElementById("sentence").innerText // Javascript

Both will give you -> hey how you

Piyush
  • 1,162
  • 9
  • 17