I use a regex for my splitfunction.
string.split(/\s/)
But  
(which is a Hair Space), will not be recognised. How to make sure it does (without implementing the exact code in the regex expression)
I use a regex for my splitfunction.
string.split(/\s/)
But  
(which is a Hair Space), will not be recognised. How to make sure it does (without implementing the exact code in the regex expression)
Per MDN, the definition of \s
in a regex (in the Firefox browser) is this:
[ \f\n\r\t\v\u00a0\u1680\u180e\u2000\u2001\u2002\u2003\u2004\u2005\u2006\u2007\u2008\u2009\u200a\u2028\u2029\u202f\u205f\u3000]
So, if you want to split on something in addition to this (e.g. an HTML entity), then you will need to add that to your own regex. Remember, string.split()
is not an HTML function, it's a string function so it doesn't know anything special about HTML. If you want to split on certain HTML tags or entities, you will have to code up a regex that includes the things you want to split on.
You can code for it yourself like this:
string.split(/\s| /);
Working demo: http://jsfiddle.net/jfriend00/nAQ97/
If what you really want to do is to have your HTML parsed and converted to text by the browser (which will process all entities and HTML tags), then you can do this:
function getPlainText(str) {
var x = document.createElement("div");
x.innerHTML = str;
return (x.textContent || x.innerText);
}
Then, you could split your string like this:
getPlainText(str).split(/\s/);
Working demo: http://jsfiddle.net/jfriend00/KR2aa/
If you want to make absolutely sure this works in older browsers, you'd either have to test one of these above functions in all browsers that you care about or you'd have to use a custom regex with all the entities you want to split on in the first option or do a search/replace on all unicode characters that you want to split on in the second option and turn them into a regular space before doing the split. Because older browsers weren't very consistent here, there is no free lunch if you want safe compatibility with old browsers.