I'm looking for a regex pattern that matches the following string:
Some example text (SET) that demonstrates what I'm looking for. Energy system models (ESM) are used to find specific optima (SCO). Some say computer systems (CUST) are cool. In the summer playing outside (OUTS) should be preferred.
My goal is to match the following:
Some example text (SET)
Energy system models (ESM)
specific optima (SCO)
computer systems (CUST)
outside (OUTS)
The important part is that it's not always exactly three words and their first letter. Sometimes the letters used for the abbreviation are merely contained in the preceding words. That's why I started looking into the positive lookbehind
. However, it is constrained by length, which can be worked around by combining it with a positive lookahead
. So far I couldn't come up with a robust solution though.
What I've tried so far:
(\b[\w -]+?)\((([A-Z])(?<=(?=.*?\3))(?:[A-Z]){1,4})\)
This works reasonable well but matches include too many words:
Some example text (SET)
Energy system models (ESM)
are used to find specific optima (SCO)
Some say Computer systems (CUST)
In the summer playing outside (OUTS)
I have also tried to use a reference to the first letter of the abbreviation at the start of the first group. That didn't work at all though.
Things I have looked at but didn't find useful:
Useful resources: