I have a blob of text like:
6.9 fdafsaf
dfasfsdafasdf
asdfsdaf.
asdfasfsa
6.9.1 asdfasdffsdaasdfdfasasdf
adfdafsdfasdfassdfa.
asdfasdf.asdf.
6.10.1 header
adfsfdasfadfasd.
asdfasdfsa.asdf.
asdfasdf.
<?xml version="1.0" encoding="utf-8"?>
....
</xs:schema>
I want to extract 2 things:
- The closest header
6.10.1 header
- The XML
<?xml version="1.0" encoding="utf-8"?>
....
</xs:schema>
So I match the header:
(\d+\.\d+\.\d+.*)
Then a lazy match of text:
[\s\S]*?
Then the XML:
(<\?xml[\s\S]*?<\/xs:schema>)
However, the match I get includes the previous header too!
(Full Match)
6.9.1 asdfasdffsdaasdfdfasasdf
adfdafsdfasdfassdfa
6.10.1 header
adfsfdasfadfasd
<?xml version="1.0" encoding="utf-8"?>
....
</xs:schema>
Clearly, my lazy quantifier between the header and xml is incorrect. I really want to specify the first match where that space between the two doesn't include any header matches.
How do I do this?
Full expression:
(\d+\.\d+\.\d+.*)[\s\S]*?(<\?xml[\s\S]*?<\/xs:schema>)