Requirement
I have a Postgres column containing values in two forms: personal and corporate names. Personal names contain a comma while corporate names do not.
_owner_titlecase
-------------------------
McCartney, James Paul
Lennon, John Winston Ono
Harrison, George
Starkey, Richard
The Beatles
I have to produce a query which abbreviates the personal names only, like this:
regexp_replace
-------------------------
McCartney, J P
Lennon, J W O
Harrison, G
Starkey, R
The Beatles
Background
After some performance tests, I've realised I can't use CASE
to treat the two row types differently (as in CASE WHEN _owner_titlecase ~ ',' regexp_replace...
). So I'm hoping there's a way to write a single regex which can treat the two types differently.
I previously asked about how to handle the initials part of personal names, and am now using (^\w+)|\Y\w
regex as follows:
, regexp_replace(_owner_titlecase
, '(^|;\s+)(\w+)|\Y\w'
, '\1', 'g')
Now I've widened the scope to look at the corporate names, of course The Beatles
is being abbreviated to The B
.
The \Y
is a Postgres regex character class which I learned about here which matches only at a point that is not the beginning or end of a word. While the special Postgres classes look like they could be of use in this situation, it would actually be useful to stick with regex features which are universal, so I can test them in places like regex101.com. Currently my only Postgres test platforms are somewhat awkward and provide no debugging help.
The whole story is that we have a CartoDB map upon which we want to superimpose a layer containing property owner names. Some properties are close together and the list of owner names can be long, hence the need to abbreviate.