Why does this regex take so long to execute?

Question

I created regex that's supposed to move text inide of an adjoining <span> tag.

const fix = (string) => string.replace(/([\S]+)*<span([^<]+)*>(.*?)<\/span>([\S]+)*/g, "<span$2>$1$3$4</span>")

fix('<p>Given <span class="label">Butter</span>&#39;s game, the tree counts as more than one input.</p>')
// Results in:
'<p>Given <span class="label">Butter&#39;s</span> game, the tree counts as more than one input.</p>'

But if I pass it a string where there is no text touching a <span> tag, it takes a few seconds to run.

I'm testing this on Chrome and Electron.

If you are concerned only with `span` use this :- `(.*?)<\/span>`..https://regex101.com/r/fL9rG0/1 — rock321987, Apr 24 '16 at 08:31
also I see `([^<]+)*` an extra `*` which I don't think is needed — rock321987, Apr 24 '16 at 08:33
If you don't have inner elements, replace `(.*?)` with `([^<]*)`. This will be much faster — Denys Séguret, Apr 24 '16 at 08:33
one more thing :- your regex is having catastrophic backtracking if `` is not present — rock321987, Apr 24 '16 at 08:34
Don't do this is the best answer. Use any of the [methods for parsing HTML in JavaScript](http://stackoverflow.com/questions/10585029/parse-a-html-string-with-js). — tadman, Apr 24 '16 at 08:46
@tadman, can you prove that it's faster to parse the html, manipulate it, and compile it into a string again? — demux, Apr 24 '16 at 09:16
@demux The performance characteristics of a regular expression of this sort is wildly unpredictable. On some strings it might be faster, but on others it might jam up and take literally forever. I guarantee that the DOMParser solution will produce *consistent* results even if they're not as performant. If this is only running hundreds of times that cost is utterly irrelevant. If this is running frequently then I'd be extremely concerned about using that regular expression. — tadman, Apr 24 '16 at 09:21

rock321987 · Accepted Answer · 2016-04-24T08:53:10.360

4

([\S]+)* and ([^<]+)* are the culprits that causes catastrophic backtracking when there is no </span>. You need to modify your regex to

([\S]*)<span([^<]*)>(.*?)<\/span>([\S]*)

It will work but its still not efficient.

Why use character class for \S? The above reduces to

(\S*)<span([^<]*)>(.*?)<\/span>(\S*)

If you are concerned only about content of span, use this instead

<span([^<]*)>(.*?)<\/span>

Check here <= (See the reduction in number of steps)

NOTE : At last don't parse HTML with regex, if there are tools that can do it much more easily

edited Apr 24 '16 at 08:53

answered Apr 24 '16 at 08:40

rock321987

10,942
1
30
43

Any idea how to fix it? – demux Apr 24 '16 at 08:43
@demux i am writing it – rock321987 Apr 24 '16 at 08:46
The point is to move text that is touching the `span` into the `span`, so no, I'm not only concerned about the contents – demux Apr 24 '16 at 08:52
@demux then you can use the second one with capturing groups – rock321987 Apr 24 '16 at 08:55
@demux do you want to move `
Given` and `'s` inside span tag?
– rock321987 Apr 24 '16 at 09:01
@demux what's the final output? – rock321987 Apr 24 '16 at 09:08
It's in the question – demux Apr 24 '16 at 09:09

Why does this regex take so long to execute?

1 Answers1