1

I am looking to use Regex to find all instances of a certain letter in a given string, but NOT if that letter appears in a larger word/phrase. For example:

For test string:

lag(a,1) + 252*a + max(3a*2) / 5*pctrange(a,10)

I want to obtain all instances of the letter 'a' excluding the letter 'a' that appears in the following three words:

lag max pctrange

i.e. I would like to use Regex to get all instances of the letter 'a' as highlighted here:

lag(a,1) + 252*a + max(3a*2) / 5*pctrange(a,10)

I attempted to use the following Regex but it keeps including the character after my desired letter 'a':

a[^"lag|max|pctrange"]

To provide some context, I'm in Python looking to replace these 'a' instances using the re module:

import re
string = "lag(a,1) + 252*a + max(3a*2) / 5*pctrange(a,10)"
words = ["lag", "max", "pctrange"]
replace = "_"
re.sub(f"a[^\"{'|'.join(words)}\"]", replace, string)

This results in the (undesired) output:

lag(_1) + 252*_+ max(3_2) / 5*pctrange(_10)

I would like instead for the output to be the following:

lag(_,1) + 252*_ + max(3_*2) / 5*pctrange(_,10)

Edit: Note that the search isn't always for a single letter, for example sometimes I want to search for "aa" instead of "a", or "bdg" instead of "a" etc. It's more important to focus on the list of words to be excluded (e.g. in the above example, "lag" "max" and "pctrange").. I don't need to ignore anything other than the specific words that show up in this list. Thank you.

3 Answers3

3

I think what you are looking for are world boundaries:

The following regex matches a only if it's enclosed in two world boundaries or if it has a digit behind it:

(?<=\d)a\b|\ba\b

https://regex101.com/r/7IfinZ/1

3

To prevent a from being matched if adjacent to another letter try negative lookarounds.

(?i)(?<![a-z])a(?![a-z])

See this demo at regex101 - Used the (?i) flag for caseless matching: [a-z][a-zA-Z]


Update: To skip certain words and match the remaining a try PyPI regex using verbs (*SKIP)(*F).

import regex as re
str = re.sub(fr"\b(?i:{'|'.join(words)})\b(*SKIP)(*F)|a", "_", str)

Another demo at regex101 or see a Python demo at tio.run

What's on the left side of the | alternation will be skipped and what's on the right get matched. Used i ignorecase-flag and \b word boundaries for words inside the (?: non capturing group ).

bobble bubble
  • 16,888
  • 3
  • 27
  • 46
  • Thank you, this works but i didn't mention that sometimes i'm looking for other patterns such as "aa" or "bdg" etc, not just "a". For example if i want to highlight "aa" in a string like "lag(aa,1)", what Regex can I use? Assume that I always have the list of words to ignore (e.g. "lag"). I have edited my original question to include this additional criteria. – dimitriapostol Nov 03 '22 at 10:33
  • 1
    I added [another solution](https://regex101.com/r/p8wB3L/1) for skipping words using [PyPI regex](https://pypi.org/project/regex/) to the answer. Btw can't see yet why this lookaround-solution would not [be good for `aa`](https://regex101.com/r/0EgY2p/2). – bobble bubble Nov 03 '22 at 12:06
1

This regex focuses on matching variables but excluding certain words:

[a-z]++(?<!lag|pctrange|max)

https://regex101.com/r/zBHPQu/1

In this case what makes this regex working is the possessive quantifier ("++") that matches as many times as possible the [a-z] pattern.

  • 1
    Thank you! For anyone in Python looking to get this to work, use the `regex` module instead of `re` – dimitriapostol Nov 03 '22 at 13:20
  • @dimitriapostol if this solve your issue, please accept is as best answer so other users facing the same problem can find a solution – Cristiano Schiaffella Nov 03 '22 at 13:25
  • Done, thank you Cristiano. Quick follow up question: instead of the generic [a-z], lets say i want to specifically search for just "a" or "aa" etc... how would you change the regex to accommodate? – dimitriapostol Nov 03 '22 at 14:14
  • Figured it out : `aa(?<!lag|pctrange|max)` – dimitriapostol Nov 03 '22 at 14:36
  • Actually @CristianoSchiaffella my above solution doesn't work for single-letters: `a(?<!lag|pctrange|max)` doesn't work if i'm searching for "a" while `aa(?<!lag|pctrange|max)` does work for "aa". Do you have a solution that will work for both ? – dimitriapostol Nov 03 '22 at 15:45