I have a XML file structure like this:
<word id="15" pos="SS">
<token>infarto</token>
<lemmas>infarto</lemmas>
</word>
<word id="16" pos="AS">
<token>miocardico</token>
<lemmas>miocardico</lemmas>
</word>
<word id="17" pos="AS" annotated="head">
<token>acuto</token>
<lemmas>acuto</lemmas>
</word>
<word id="18" pos="E">
<token>in</token>
<lemmas>in</lemmas>
</word>
<word id="19" pos="SS">
<token>corso</token>
<lemmas>corso</lemmas>
</word>
What I'm trying to do, is getting the values for "pos" and "token" of the word surrounding the one that has the word id 17 (the annotated = "head" one).
This is no problem for all matches coming after word 17.
(pos=")(.+)(")(\s\S+?)("head")([\s\S]+?)(>)(\w+?)(<+)([\S\s]+?)(pos=")(.+)(")([\s\S]+?) (token>)(.+)(<)([\s\S]+?)
This gets me all the information I want and if I want to expand I can just add
(pos=")(.+)(")([\s\S]+?)(token>)(.+)(<)([\s\S]+?)
to the end. It isn't pretty, but it works.
Now when I go want to go into the other direction, I'm absolutely stumped
(pos=")(.+)(")([\s\S]+?)(token>)(.+)(<)([\s\S]+?)(pos=")(.+)(")(\s\S+?)("head")
Instead of matching only the information of word 16 ( the first in front of "annotated head"), it matches all the information that comes before (word 15, word 14, word 13, etc).
What am I missing?
P.S. Using an XML parser is sadly not an option.