I am attempting to fix up some regex which searches for html elements with attributes named "langtoken" as well as other permutations such as "langtoken_title". If I search for langtoken by itself by using a word boundary it returns the results from the ~1,750,000 character string in around 0.2 seconds however if I omit to word boundary to capture the langtoken_title attributes this spikes to around 95 seconds.
The regex I originally had was
<([^>\s]+) [^>]*langtoken([^>]*?(\\*)?/>|.*?<(\\*)?/\1>)
So far my attempts have changed it to
<([^>\s]+) [^>]*langtoken(?:\b|_)(?:[^>]*/>|.*?</\1>)
I should note that in the string being searched (an html document) there are 1431 occurrences of elements with the langtoken attribute and only 5 of the langtoken_title attribute. I believe it is the near match that is causing the issue but I am not sure.
This is my first foray into regex and any help would be appreciated in creating a more efficient expression.