Python Regex: How to find a substring

Question

I have a list of titles that I need to normalize. For example, if a title contains 'CTO', it needs to be changed to 'Chief Technology Officer'. However, I only want to replace 'CTO' if there is no letter directly to the left or right of 'CTO'. For example, 'Director' contains 'cto'. I obviously wouldn't want this to be replaced. However, I do want it to be replaced in situations where the title is 'Founder/CTO' or 'CTO/Founder'.

Is there a way to check if a letter is before 'CXO' using regex? Or what would be the best way to accomplish this task?

EDIT: My code is as follows...

test = 'Co-Founder/CTO'
test = re.sub("[^a-zA-Z0-9]CTO", 'Chief Technology Officer', test)

The result is 'Co-FounderChief Technology Officer'. The '/' gets replaced for some reason. However, this doesn't happen if test = 'CTO/Co-Founder'.

Does this answer your question? [Python regex lookbehind and lookahead](https://stackoverflow.com/questions/47886809/python-regex-lookbehind-and-lookahead) — Cireo, Jun 14 '21 at 19:11

Wes Hardaker · Answer 1 · 2021-06-14T19:09:38.333

2

What you want is a regex that excludes a list of stuff before a point:

"[^a-zA-Z0-9]CTO"

But you actually also need to check for when CTO occurs at the beginning of the line:

"^CTO"

To use the first expression within re.sub, you can add a grouping operator (()s) and then use it in the replacement to pull out the matching character (eg, space or /):

re.sub("([^a-zA-Z0-9])CTO","\\1Chief Technology Officer", "foo/CTO")

Will result in

'foo/Chief Technology Officer'

edited Jun 14 '21 at 19:09

answered Jun 14 '21 at 18:34

Wes Hardaker

21,735
2
38
69

How would I implement this using re.sub()? In the case of 'Founder/CTO', the '/' gets replaced so the end result is 'FounderChief Technology Officer. Or is there a better way other than re.sub()? – codr Jun 14 '21 at 18:45
Thanks, much appreciated. Just to clarify, the '\\1' in the replacement references the "([^a-zA-Z0-9])" grouping? – codr Jun 14 '21 at 19:22
that's correct. You can group things in ()s and then extract whatever it matched later. – Wes Hardaker Jun 14 '21 at 22:11

score 1 · Accepted Answer · answered Jun 14 '21 at 19:22

Answer: "(?<=[^a-zA-Z0-9])CTO|^CTO"

Lookbehinds are perfect for this

cto_re = re.compile("(?<=[^a-zA-Z0-9])CTO")

but unfortunately won't work for the start of lines (due only to the python implementation requiring fixed length).

for eg in "Co-Founder/CTO", "CTO/Bossy", "aCTOrMan":
    print(cto_re.sub("Chief Technology Officer", eg))

Co-Founder/Chief Technology Officer
CTO/Bossy
aCTOrMan

You would have to check for that explicitly via |:

cto_re = re.compile("(?<=[^a-zA-Z0-9])CTO|^CTO")

for eg in "Co-Founder/CTO", "CTO/Bossy", "aCTOrMan":
    print(cto_re.sub("Chief Technology Officer", eg))

Co-Founder/Chief Technology Officer
Chief Technology Officer/Bossy
aCTOrMan

Python Regex: How to find a substring

2 Answers2