1

I have a list of titles that I need to normalize. For example, if a title contains 'CTO', it needs to be changed to 'Chief Technology Officer'. However, I only want to replace 'CTO' if there is no letter directly to the left or right of 'CTO'. For example, 'Director' contains 'cto'. I obviously wouldn't want this to be replaced. However, I do want it to be replaced in situations where the title is 'Founder/CTO' or 'CTO/Founder'.

Is there a way to check if a letter is before 'CXO' using regex? Or what would be the best way to accomplish this task?

EDIT: My code is as follows...

test = 'Co-Founder/CTO'
test = re.sub("[^a-zA-Z0-9]CTO", 'Chief Technology Officer', test)

The result is 'Co-FounderChief Technology Officer'. The '/' gets replaced for some reason. However, this doesn't happen if test = 'CTO/Co-Founder'.

codr
  • 51
  • 8
  • Does this answer your question? [Python regex lookbehind and lookahead](https://stackoverflow.com/questions/47886809/python-regex-lookbehind-and-lookahead) – Cireo Jun 14 '21 at 19:11

2 Answers2

2

What you want is a regex that excludes a list of stuff before a point:

"[^a-zA-Z0-9]CTO"

But you actually also need to check for when CTO occurs at the beginning of the line:

"^CTO"

To use the first expression within re.sub, you can add a grouping operator (()s) and then use it in the replacement to pull out the matching character (eg, space or /):

re.sub("([^a-zA-Z0-9])CTO","\\1Chief Technology Officer", "foo/CTO")

Will result in

'foo/Chief Technology Officer'
Wes Hardaker
  • 21,735
  • 2
  • 38
  • 69
  • How would I implement this using re.sub()? In the case of 'Founder/CTO', the '/' gets replaced so the end result is 'FounderChief Technology Officer. Or is there a better way other than re.sub()? – codr Jun 14 '21 at 18:45
  • Thanks, much appreciated. Just to clarify, the '\\1' in the replacement references the "([^a-zA-Z0-9])" grouping? – codr Jun 14 '21 at 19:22
  • that's correct. You can group things in ()s and then extract whatever it matched later. – Wes Hardaker Jun 14 '21 at 22:11
1

Answer: "(?<=[^a-zA-Z0-9])CTO|^CTO"

Lookbehinds are perfect for this

cto_re = re.compile("(?<=[^a-zA-Z0-9])CTO")

but unfortunately won't work for the start of lines (due only to the python implementation requiring fixed length).

for eg in "Co-Founder/CTO", "CTO/Bossy", "aCTOrMan":
    print(cto_re.sub("Chief Technology Officer", eg))

Co-Founder/Chief Technology Officer
CTO/Bossy
aCTOrMan

You would have to check for that explicitly via |:

cto_re = re.compile("(?<=[^a-zA-Z0-9])CTO|^CTO")
for eg in "Co-Founder/CTO", "CTO/Bossy", "aCTOrMan":
    print(cto_re.sub("Chief Technology Officer", eg))

Co-Founder/Chief Technology Officer
Chief Technology Officer/Bossy
aCTOrMan
Cireo
  • 4,197
  • 1
  • 19
  • 24