.+
is a greedy pattern and you only rely on .
not matching a linebreak (without the s
flag).
This might not be robust.
Your string is an HTML string, so using an HTML parser is a more appropriate start, using DOMParser
.
Since HTML comments can be placed anywhere in HTML, the HTML parser will place these contents automatically in different places; wrap the string in a <body>
…</body>
to make sure everything is placed in one consistent spot.
You can later access the contents by .body.childNodes
.
Next, use Array.from
to convert the list of Nodes into a proper Array and filter
it
- by
nodeType
to get only the HTML comment nodes (using the static properties on Node
), and
- by
textContent
to get only those comments starting with paragraph
and not those starting with /paragraph
(using trim
and startsWith
).
map
over the resulting comment nodes to get their text contents.
Now it’s a bit unclear what the format is.
Is it always one word (with no spaces), followed by a single space, followed by the {
…}
structure?
Can there be multiple {
…}
structures?
Can there be something after the {
…}
structure?
You’ll have to figure this out for yourself and refine any regex, but I’m going to assume that the paragraph
…/paragraph
thing is analogous to HTML tags, which would mean that the first space is followed by the {
…}
structure.
However, I’m not going to assume that these {"className":"123"}
structures are always going to be free of spaces.
Splitting only by the first space, discarding the text before, and keeping the rest can be achieved by splitting by all spaces, taking everything from index 1, and merging everything by a space again: .split(" ").slice(1).join(" ")
.
The intermediate result is:
[
"{\"className\":\"123\"}",
"{\"className\":\"456\"}",
"{\"className\":\"789\"}"
]
These are JSON strings.
Use JSON.parse
(in the existing map
) to parse everything and access the className
property.
Now you have this inermediate result:
[
"123",
"456",
"789"
]
Joining it all with .join("")
results in the desired "123456789"
string.
Full code
const string =
`<!-- paragraph {"className":"123"} -->
<p>abc</p>
<!-- /paragraph -->
<!-- paragraph {"className":"456"} -->
<p>cde</p>
<!-- /paragraph -->
<!-- paragraph {"className":"789"} -->
<p>fgh</p>
<!-- /paragraph -->`,
result = Array.from(new DOMParser().parseFromString(`<body>${string}</body>`, "text/html")
.body
.childNodes)
.filter(({ nodeType, textContent }) => nodeType === Node.COMMENT_NODE && textContent.trim().startsWith("paragraph"))
.map(({ textContent }) => JSON.parse(textContent.trim()
.split(" ")
.slice(1)
.join(" "))
.className)
.join("");
console.log(result);