1

I am working on a dtb makros model with SQL. I have a table with contacts and someone wrote Dr. in the firstname-colum, e.g. Dr. Anna. I only want the firstname without the Dr., so I just want the name Anna in the firstname column.

I have tried to get rid of it with


regexp_replace(trim(firstname), '\bDr.\b', '')

but it is not working. On the other hand, when I tried to get rid of Pfarrer (which is the German expression for priest) Anna, it worked.

I used


regexp_replace(trim(firstname), '\bPfarrer\b', '')

for this. I only got Anna back, as I wanted.

Why does it work with Pfarrer and not with Dr.?

DataVE
  • 59
  • 8
  • 4
    It is a common regex issue. `\bDr\.\b` will only match in `Dr.A`, but not in `Dr. A` as a space is not a word char. Remove the last `\b`, or replace with `(?!\B\w)`. – Wiktor Stribiżew Jan 19 '23 at 10:21
  • 1
    https://stackoverflow.com/questions/18004955/regex-using-word-boundary-but-word-ends-with-a-period – bobble bubble Jan 19 '23 at 10:28
  • Removing the last \b worked for me, thank you! – DataVE Jan 19 '23 at 10:29
  • You can also keep the last `\b` but add optional space before, if you don't want to remove standalone `Dr.` which is hypothetical. E.g. [`\bDr\. *\b`](https://regex101.com/r/ZVoNrn/1) will only match `Dr.` if followed by another word. Further don't forget to escape the dot from its special regex meaning, else it matches (almost) any charater. – bobble bubble Jan 19 '23 at 10:34

1 Answers1

1

Generally speaking, you might want to remove all titles, including Mr., Mrs., Dr., and anything else following this pattern:

SELECT REGEXP_REPLACE(TRIM(firstname), '^\w+\.[ ]*', '') AS firstname
FROM yourTable;
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360