0

I have a search method where i'm dynamically creating a regex expression.
For example: Search term = "one word" .

The regex expression look like this:

(\\s*<[^>]+>\\s)*three(\\s*<[^>]+>\\s)*four

From some reason it causing chrome browser to hang.
Any help solving that issue will be more than appreciated

[EDIT]

The searched area looks like that:

<span id="1" title="bbox 483 1557 715 1602; x_wconf 96"><strong><em>one</em></strong></span> 
<span id="2" title="bbox 738 1557 986 1592; x_wconf 77"><strong><em>two</em></strong></span>
<span id="3" title="bbox 483 1557 715 1602; x_wconf 96"><strong><em>thre</em></strong></span> 
<span id="4" title="bbox 738 1557 986 1592; x_wconf 77"><strong><em>four</em></strong></span>
<span id="5" title="bbox 483 1557 715 1602; x_wconf 96"><strong><em>five</em></strong></span> 
<span id="6" title="bbox 738 1557 986 1592; x_wconf 77"><strong><em>six</em></strong></span>

So if I'm looking for the "three four", I should receive those two elements:

<span id="3" title="bbox 483 1557 715 1602; x_wconf 96"><strong><em>three</em></strong></span> 
<span id="4" title="bbox 738 1557 986 1592; x_wconf 77"><strong><em>four</em></strong></span>
Igor.r
  • 77
  • 8
  • 1
    https://stackoverflow.com/questions/29751230/regex-pattern-catastrophic-backtracking –  Feb 17 '20 at 14:49
  • Please show the code where u are using the regex. – brso05 Feb 17 '20 at 14:50
  • 2
    You appear to be using regex to parse HTML. That is generally considered a bad idea. –  Feb 17 '20 at 14:53
  • 2
    @Amy yes, v̸͙͋e̴̩̚ṙ̷̢y̶͙͗ b̴͔̳̙͚̓̐̇ä̴̰̞̏d̴͎̈i̴̧̨͔̬̭̠̮͙͍̒̈́̕͠d̸͙̈́͐͊̈̒̽̒͒̓͝ȩ̴̛̛̱̳̱͇͙̙̣̪̯͉̄̌͆͑́̓͜a̸͕̗̥̱̽̎͐͝ – VLAZ Feb 17 '20 at 14:56
  • @Amy Thanks for the help. Unfortunately I need to use the regex here. Can you please suggest me how can I improve it? – Igor.r Feb 17 '20 at 15:04
  • 1
    Yes. Don't use the regex. Use something made to parse HTML, like the `DOMParser`. –  Feb 17 '20 at 15:07
  • What are you actually trying to match? Words where each can be arbitrarily nested in tags? E.g., `

    one

    word

    `?
    – VLAZ Feb 17 '20 at 15:13
  • @VLAZ Thank you fo the help. I need the ids inside the sapn tags `One word` – Igor.r Feb 17 '20 at 15:19
  • Wait, what are you actually trying to match inside that? – VLAZ Feb 17 '20 at 15:20
  • I want to match the words inside the spans according to the searched term – Igor.r Feb 17 '20 at 15:21
  • 1
    A [Treewalker](https://developer.mozilla.org/en-US/docs/Web/API/TreeWalker) with `NodeFilter.SHOW_TEXT` as a filter will extract the text nodes, and you can use DOM methods to traverse the DOM tree from there. –  Feb 17 '20 at 15:27
  • @Amy The treewalker definitely look interesting, but please see the edit that I added to the question. Is it possible to receive the wanted results with the treewalker? – Igor.r Feb 17 '20 at 16:10
  • So split the search string into words, search for each separately. I don't see the issue. –  Feb 17 '20 at 16:13
  • But there can be more occurrences of the searched word, i need to get spans only if the words inside them are coming one after another as in the search term – Igor.r Feb 17 '20 at 16:19
  • 1
    Okay. The tree walker will give you a list of text nodes. You then need to use DOM methods to link the text nodes together and find the ones that are siblings. It's not going to be automatic. Your question explicitly calls for a regex solution. I've written Treewalker solutions on I-need-a-regex-to-parse-html questions before and get downvoted. –  Feb 17 '20 at 16:22

0 Answers0