This is driving me insane.
I'm trying to find every instance of "DOI" or it's mis-scanned equivalents in a series of documents. I then want to collect the term "DOI" and up to 15 alpha numeric characters that should come after it. But I also need to ensure I find these even if they overlap with a previous match.
I've tried to extrapolate this previous solution I was given to another similar problem but with no success.
Python regex find all overlapping matches?
Here is the example I'm using to test this.
String to search :
"abhgfigDOI567afkgD0Idhdhfhfhdbvbkab3343432q3DO1fbaguig7ggkgafgkgDOIDOID01OO1"
DOI variations:
DOI|DO1|D01|D0I|001|00I|0O1|0OI|O01|O0I|OO1|OOI
Expected results:
["DOI567afkgD0Idhdhf",
"D0Idhdhfhfhdbvbkab",
"DO1fbaguig7ggkgafg",
"DOIDOID01OO1",
"DOID01OO1",
"D01OO1",
"001"]
Any assistance would be most appreciated!
Thanks!