I am fairly new to python so I apologies if this is quite a novice question, but I am trying to extract text from parentheses that has specific format from a raw text file. I have tried this with regular expressions, but please let me know if their is a better method.
To show what I want to do by example:
s = "Testing (Stackoverflow, 2013). Testing (again) (Stackoverflow, 1999)"
From this string I want a result something like:
['(Stackoverflow, 2013)', '(Stackoverflow, 1999)']
The regular expression I have tried so far is
"(\(.+[,] [0-9]{4}\))"
in conjunction with re.findall(), however this only gives me the result:
['(Stackoverflow, 2013). Testing (again) (Stackoverflow, 1999)']
So, as you may have guessed, I am trying to extract the bibliographic references from a .txt file. But I don't want to extract anything that happens to be in parentheses that is not a bibliographic reference.
Again, I apologies if this is novice, and again if there is a question like this out there already. I have searched, but no luck as yet.