At the moment i am working on text that is broken into floating columns to display it in a magazine-like
way.
I asked in a previous question how to split
the text into sentences and it works like a charm:
sentences = text.replace(/\.\s+/g,'.|').replace(/\?\s/g,'?|').replace(/\!\s/g,'!|').split("|");
Now i want to go a step further and split it into words. But i do also have some elements in it, that should not be splitted. Like subheadlines.
An example text would be:
A wonderful serenity has taken possession of my entire soul. <strong>This is a subheadline</strong><br><br>I am alone, and feel the charm of existence in this spot.
My desired result would look like the following:
Array [
"A",
"wonderful",
"serenity",
"has",
"taken",
"possession",
"of",
"my",
"entire",
"soul.",
"<strong>This is a subheadline</strong>",
"<br>",
"<br>",
"I",
"am",
"alone,",
"and",
"feel",
"the",
"charm",
"of",
"existence",
"in",
"this",
"spot."
]
When i split at all whitespaces i do get the words, but the "<br>"
won't be added as a new array entry. I also don't want to split the subheadline and markup.
The reason why i want to do this, is that i add sequence after sequence to a p-tag and when the height gets bigger than the surrounding element i remove the last added sequence and create a new floating p-tag. When i splitted it into sentences i saw, that the breakup was not good enough to ensure a good reading flow.
An example what i try to achieve can you see here
If you need any further information i will be glad to give it to you.
Thanks in advance,
Tobias
EDIT
The string could contain more html tags in the future. Is there a way to not touch anything between these tags?
EDIT 2
I created a jsfiddle: http://jsfiddle.net/m9r9q/1/
EDIT 3
Would it be a good idea to remove all html tags with encapsulated text and replace it with placeholders? Then split the string into words and add the untouched html-tags when the placeholder is reached? What would be the regex to extract all html tags?