I'm trying to extract all the words between two phrases using the following regex:
\b(?:item\W+(?:\w+\W+){0,2}?(?:1|one)\W+(?:\w+\W+){0,3}?business)\b(.*)\b(?:item\W+(?:\w+\W+){0,2}?(?:3|three)\W+(?:\w+\W+){0,3}?legal\W+(?:\w+\W+){0,3}?proceedings)\b
The documents I'm running this regex on are 10-K filings. The filings are too long to post here (see regex101 url below for example), but basically they are something like this:
ITEM 1. BUSINESS
lots of words
ITEM 2. PROPERTIES
lots of words
ITEM 3. LEGAL PROCEEDINGS
I want to extract all the words between ITEM 1
and ITEM 3
. Note that the subtitles for each ITEM may be slightly different for each 10-K filing, hence I'm allowing for a few words between each word.
I keep getting catastrophic backtracking error, and I cannot figure out why. For example, please see https://regex101.com/r/zgTiyb/1.
What am I doing wrong?