0

So I am getting a string of HTML Markup like:

const markup = "<p>thank you for contacting us.&nbsp;<span class=“ck-restricted-editing-exception”>Your</span> case was logged as&nbsp;<span class=“ck-restricted-editing-exception”>Case ID</span>&nbsp;and is assigned to&nbsp;<span class=“ck-restricted-editing-exception”>Technician Name</span>. We will attempt to resolve your issue within the next&nbsp;<span class=“ck-restricted-editing-exception”>Time</span>&nbsp;hours.</p>"

What I want to do is select all the <span>s (one by one), get their inner content, store it in an array of objects and assign an unique id or something similar to the span.

Like so

const spanValues = [
  { spanId: 1, spanContent: 'Your' },
  ...and so on
]

I am thinking of splitting the string by "<span" and "</span>" in order to get an array and then looping over that array, finding all the elements that start with "<span" and are "</span>" and perform operations on those strings.

Can't figure out how can I get and store the values from the <span>s though

However it already is sounding like a messy solution. Anything else anyone can suggest?

Ajay Gupta
  • 1,944
  • 2
  • 21
  • 36
  • In what environment? A browser? Node.js? Something else? – T.J. Crowder Dec 12 '19 at 07:40
  • @T.J.Crowder Basically this will be run inside a WebView in a mobile app built with React Native. So a browser it is. – Ajay Gupta Dec 12 '19 at 07:41
  • Instead of manipulating the string, it would be cleverer to actually manipulate the DOM instead, otherwise if you want to work against that string, you should probably use a regex. – briosheje Dec 12 '19 at 07:42
  • @briosheje Makes sense, but the markup is coming from a web based text editor i.e. CKEditor 5. Can you elaborate on how it can be done with Regex? – Ajay Gupta Dec 12 '19 at 07:43
  • @AjayGupta ok, I will post a regex-based solution – briosheje Dec 12 '19 at 07:46

2 Answers2

2

You need an HTML parser to parse HTML, it's too complicated for basic string splitting and such.

Fortunately, no matter what your environment, you almost always have an HTML parser availble to you. For instance, in browsers, of course the browser knows how:

const div = document.createElement("div");
div.innerHTML = markup;

const spanValues = [...div.querySelectorAll("span")].map((span, index) => ({
    spanId: index + 1,
    spanContent: span.textContent
}));

Live Example:

const markup = "<p>thank you for contacting us.&nbsp;<span class=“ck-restricted-editing-exception”>Your</span> case was logged as&nbsp;<span class=“ck-restricted-editing-exception”>Case ID</span>&nbsp;and is assigned to&nbsp;<span class=“ck-restricted-editing-exception”>Technician Name</span>. We will attempt to resolve your issue within the next&nbsp;<span class=“ck-restricted-editing-exception”>Time</span>&nbsp;hours.</p>";

const div = document.createElement("div");
div.innerHTML = markup;

const spanValues = [...div.querySelectorAll("span")].map((span, index) => ({
    spanId: index + 1,
    spanContent: span.textContent
}));

console.log(spanValues);

That specific example relies on the NodeList from querySelectorAll being iterable, which it is on modern browsers. (See my answer here for more and how to polyfill it on browsers where arrays are iterable but NodeList isn't.)

Or you can use Array.prototype.map on it directly:

const spanValues = Array.prototype.map.call(div.querySelectorAll("span"), (span, index) => ({
    spanId: index + 1,
    spanContent: span.textContent
}));
T.J. Crowder
  • 1,031,962
  • 187
  • 1,923
  • 1,875
2

PREFACE

BEWARE that such a simple regex would fail on COMPLEX cases. As mentioned by T.J. below, you just can't parse HTML with a regex, but since the request mentioned that the text is coming from CKEditor, I would assume you could restrict the amount of cases you would encounter through it.

Once again, it's just a solution that implies being unable to parse the html in other ways. If you can somehow parse the HTML, do not rely on this solution.

PREFACE ENDS HERE.

Here is a solution assuming your markup is actually textual (as mentioned in the comments) and the ids are just assumed to be the order the match was encountered. Also, this assumes there is no other solutions other than using a regex. In a nutshell, this assumes you just can't work with any kind of virtual DOM, HTML parser or something similar, hence a regex is needed.

This example relies on a single regex: https://regex101.com/r/Y6KreE/1

And takes advantage of a while loop to build up the array of objects.

const markup = "<p>thank you for contacting us.&nbsp;<span class=“ck-restricted-editing-exception”>Your</span> case was logged as&nbsp;<span class=“ck-restricted-editing-exception”>Case ID</span>&nbsp;and is assigned to&nbsp;<span class=“ck-restricted-editing-exception”>Technician Name</span>. We will attempt to resolve your issue within the next&nbsp;<span class=“ck-restricted-editing-exception”>Time</span>&nbsp;hours.</p>";

const regex = /<span[^>]*>(.+?)<\/span>/gm;
const spanValues = [];
let matchGroup, i = 0;
while ((i++, matchGroup = regex.exec(markup)) !== null) {
  spanValues.push({
    spanId: i,
    spanContent: matchGroup[1]
  });
}

console.log(spanValues);
T.J. Crowder
  • 1,031,962
  • 187
  • 1,923
  • 1,875
briosheje
  • 7,356
  • 2
  • 32
  • 54
  • This breaks as soon as there's a `>` in an attribute on one of the `span`s. Famously, you [can't parse HTML with a regular expression](https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags#1732454). Every single time you say "Oh, but my examp[le is nice and simple..." it ends up getting changed and breaking. Every. Time. – T.J. Crowder Dec 12 '19 at 08:01
  • 1
    @T.J.Crowder I Know, that was built up on that example though. I'm aware this would likely fail in more complex cases, though as an example it might be worth a shot. However, you may know as well that CKEditor does not put any `>` in an attribute, it is not meant to do so, hence my post. – briosheje Dec 12 '19 at 08:03