1

I have a string that can exist in either of the following two formats within a larger body of text:

OptionalSpecificString1 1234
1234 OptionalSpecificString2

The text here is all placeholders. I'm looking for a numerical string that's either preceded or followed by a specific optional string. One of the two optional specific strings will always be present and is needed to locate and capture the numerical string-of-interest. Is there a single regex pattern that exists that can capture this behavior?

Something like:

(?:OptionalSpecificString1)? (\d+) (?:OptionalSpecificString2)?

almost does it, but doesn't require that one of the two optional strings is present, and so it could end up matching any other numerical string in the body of the text. I know I could do something like:

(OptionalSpecificString1 (\d+)|(\d+) OptionSpecificString2)

but I guess I'm just wondering if there's something a little more elegant. I'm doing this with the Python re module, so code can be a bit simpler too when I can express a single capture group for the same pattern.

beeftendon
  • 876
  • 7
  • 14
  • did you have 1234 in all your text – Sivaram Rasathurai Oct 10 '20 at 01:00
  • Sorry, the 1234 is just placeholder text as well. I'll update the question to be a bit more clear about that. – beeftendon Oct 10 '20 at 01:01
  • 1
    Did you check this one https://stackoverflow.com/questions/1454913/regular-expression-to-find-a-string-included-between-two-characters-while-exclud – Sivaram Rasathurai Oct 10 '20 at 01:03
  • Thanks, unless I'm misunderstanding that post, I don't think it quite answers my question. I've made some major edits to my question though, as I realized I framed it very poorly. Edit: Ah ha, read the solutions a bit more carefully, seems like it might apply to my case after all. Didn't think to search in those terms. Will give it a try later when I have time. – beeftendon Oct 10 '20 at 01:06

1 Answers1

2

The solution could be OptionalSpecificString1\s*(?P<numeric>\d+)|(?P<numeric>\d+)\s*OptionalSpecificString2, simply making two different syntaxes regexp alternatives, if Python supported named groups redefinition.

As it doesn't, you could capture your numerical values into different groups, named or not, and choose the non-empty one back in Python code, like this:

import re
text = r'''
OptionalSpecificString1 1234
An irrelevant line
5678 OptionalSpecificString2
Another irrelevant line
'''

pattern = r'OptionalSpecificString1\s*(?P<numeric1>\d+)|(?P<numeric2>\d+)\s*OptionalSpecificString2'

numerics = []
for match in re.finditer (pattern, text):
    numerics.append (match.group ('numeric1') or match.group ('numeric2'))

print (numerics)
Alexander Mashin
  • 3,892
  • 1
  • 9
  • 15
  • Thanks, this is certainly a Pythonic way to do it. I was trying to loop through a list of different patterns with unique capture groups so I wanted to avoid anything that explicitly references the keys since I don't necessarily know which keys are applicable beforehand. However, I could just wrap the output in another loop to clean out any empty dict values. Still, I'd be interested to know if there's any regex-side way to accomplish what I was asking. – beeftendon Oct 14 '20 at 16:51
  • In the solution that I proposed. `numerics` is not supposed to contain any empty values, so it will not need cleaning. – Alexander Mashin Oct 14 '20 at 16:58