Possible Duplicate:
Matching Nested Structures With Regular Expressions in Python
I am trying to match a single group of data from a wiki page. The bit of python code I'm using is listed below. The issue is that it returns past the end of its own group to the last }}
in the page.
def findPersonInfo(self):
if (self.isPerson == True):
regex = re.compile(r"{{persondata(.*)}}",re.IGNORECASE|re.UNICODE|re.DOTALL)
result = regex.search(self._rawPage)
if result:
print 'Match found: ', result.group()
A sample of the wiki page content:
*[http://www.jsc.nasa.gov/Bios/htmlbios/acaba-jm.html NASA biography]
{{NASA Astronaut Group 19}}
{{Persondata
|NAME= Acaba, Joseph Michael "Joe"
|ALTERNATIVE NAMES=
|SHORT DESCRIPTION=[[Hydrogeologist]]
|DATE OF BIRTH={{Birth date and age|1967|5|17}}
|PLACE OF BIRTH=[[Inglewood, California]]
|DATE OF DEATH=
|PLACE OF DEATH=
}}
{{DEFAULTSORT:Acaba, Joseph M.}}
[[Category:1967 births]]
My current regex is returning the following string:
{{Persondata
|NAME= Acaba, Joseph Michael "Joe"
|ALTERNATIVE NAMES=
|SHORT DESCRIPTION=[[Hydrogeologist]]
|DATE OF BIRTH={{Birth date and age|1967|5|17}}
|PLACE OF BIRTH=[[Inglewood, California]]
|DATE OF DEATH=
|PLACE OF DEATH=
}}
{{DEFAULTSORT:Acaba, Joseph M.}}
I would like it to return:
{{Persondata
|NAME= Acaba, Joseph Michael "Joe"
|ALTERNATIVE NAMES=
|SHORT DESCRIPTION=[[Hydrogeologist]]
|DATE OF BIRTH={{Birth date and age|1967|5|17}}
|PLACE OF BIRTH=[[Inglewood, California]]
|DATE OF DEATH=
|PLACE OF DEATH=
}}
The tricky bit is it needs to count other {{
opens and }}
closes to know what group I want to stop at but I'm not sure how to get regex to do that.