So I am trying to grab the string from a BibTex using regex in python. Here is part of my string:
a = '''title = {The Origin ({S},
{Se}, and {Te})- {TiO$_2$} Photocatalysts},
year = {2010},
volume = {114},'''
I want to grab the string for the title, which is:
The Origin ({S},
{Se}, and {Te})- {TiO$_2$} Photocatalysts
I currently have this code:
pattern = re.compile('title\s*=\s*{(.*|\n?)},\s*\n', re.DOTALL|re.I)
pattern.findall(a)
But it only gives me:
['The Origin ({S},\n {Se}, and {Te})- {TiO$_2$} Photocatalysts},\n year = {2010']
How can I get the whole title string without the year
information?
Many times, year
is not right after title
. So I cannot use:
pattern = re.compile('title\s*=\s*{(.*|\n?)},\s*\n.*year', re.DOTALL|re.I)
pattern.findall(a)