0

I want to use the PRXCHANGE function in SAS to replace strings which have the pattern

[string ending in a lower case letter][string beginning in upper case letter]

but such that this pattern did not arise from a string such as "McCoy" or "MacDonald" (and possibly other exceptions which I can hard code), and replace the entire string with the second substring above. I can't quite figure out how to negate from only a subset of strings.

Andreas Louv
  • 46,145
  • 13
  • 104
  • 123
Rookatu
  • 1,487
  • 3
  • 21
  • 50

1 Answers1

1

You can use the following:

/\b(?!Mc|Mac)\w*[a-z][A-Z]\w*\b/

Testing in Perl:

$ cat file
MacDonald
John_doe
BillyTheRock
McCoy
$ perl -ne 's/\b(?!Mc|Mac)\w*[a-z][A-Z]\w*\b/__replaced__/;print;' file
MacDonald
John_doe
__replaced__
McCoy

Breakdown:

/
  \b # word boundary
  (?!Mc|Mac) # negative lookahead for Mc or Mac
  \w* # a word zero or more times
  [a-z] # lower case
  [A-Z] # UPPER CASE
  \w* # a word zero or more times
  \b # word boundary
/

What is a word boundary in regexes?

Andreas Louv
  • 46,145
  • 13
  • 104
  • 123
  • Thanks, ended up going with something similar. Problem was that if you use a look behind instead of a look ahead SAS tells you that you are using a variable length expression and doesn't support that. +1 – Rookatu Nov 16 '15 at 19:47