I'm trying to scrape a series of webpages using PHP, grabbing all of the content between the tag and the earliest tag. This is the regex that I'm using:
|(?<=div id="body">).*?</div>|s
This seems to be working perfectly fine for most of the pages I'm looking at. However, it's not returning anything for a few others. I plugged the regex into the regex101.com tester, and it told me that the problem was with catastrophic backtracking. I tried removing the lookbehind language, and even playing around with things like:
|id="body">.*?</div>|s
However, the problem is still persisting. I've looked at some other questions about catastrophic backtracking, as well as the http://www.regular-expressions.info/catastrophic.html article, but I can't figure out how to apply their fixes to this particular case.