I'm trying to capture the output of the Stanford CoreNLP dependency parser using a regex. I want to capture the dependency parse which spans several lines (everything between dependencies):
and Sentence
. A sample of the data:
Dependency Parse (enhanced plus plus dependencies):
root(ROOT-0, imply-5)
dobj(imply-5, what-1)
aux(imply-5, does-2)
det(man-4, the-3)
nsubj(imply-5, man-4)
advmod(mentions-8, when-6)
nsubj(mentions-8, he-7)
advcl(imply-5, mentions-8)
det(papers-10, the-9)
dobj(mentions-8, papers-10)
nsubj(written-13, he-11)
aux(written-13, has-12)
acl:relcl(papers-10, written-13)
Sentence #1 (10 tokens):
The code I'm using is:
regex = re.compile('dependencies\):(.*)Sentence', re.DOTALL)
found = regex.findall(text)
When I run, the code matches the whole text document rather than just the capture group. It works fine when I try it out on Regexr.
Help much appreciated