The following regular expression creates a StackOverflowError when applied on a large html page:
<li.*?>(.|\s)*?</li>
My hypothesis is that it is due to the logical "OR" operator (|
) that creates recursive calls in the matcher and, due to the large html page size that needs to be parsed, it creates the stack overflow.
Is there any way I can rewrite this regular expression without the "OR " operator (knowing that I want to capture content that is potentially split over multiple lines, hence the need of \s
)?
Many thanks, Tom