1

Is it possible to split a string by space "" and to ignore the html tags in it ?
The html tags may have style elements like : style="font-size:14px; color: rgb(0, 0, 0)" .....

The string i'm talking about is:

<div class="line"><span style="color: rgb(0,0,0)">John</span><u> has</u><b> apples</b></div>

If you can see i have space character inside the u tag and inside the b tag

What i am trying to get is the text to split as following

<div class="line"><span style="color: rgb(0,0,0)">John</span><u>

has</u><b>

apples</b></div>

I have the following regex but it does not give me the rest of the string, just the first 2 parts :

[\<].+?[\>]\s
  • 1
    What is the exact use case, there could be another way... edit: what are you trying to get – Diamond Sep 30 '15 at 08:13
  • I think you could find space " " between ">" and "<" – Neo Sep 30 '15 at 08:21
  • I don't know how to completly ignore whitespace between < > – user5379593 Sep 30 '15 at 08:27
  • Someone'g gotta post it: [*Don't parse HTML with a regular expression…*](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454). – RobG Sep 30 '15 at 08:42

1 Answers1

3

Split using the following regexp:

str.split(/ (?=[^>]*(?:<|$))/)

[
  "<div class="line"><span style="color: rgb(0,0,0)">John</span><u>", 
  "has</u><b>", 
  "apples</b></div>"
]

The ?= is a look-ahead. It says, "find spaces which are followed by some sequence of characters that are NOT greater-than signs, then a less-than sign (or end of string).

The ?: is a non-capturing group. We need that here, because split has a special behavior: the presence of a capturing group tells it to include the splitters in the resulting array of pieces, which we don't want.