0

Given a name_loc column of text like the following:

{"Charlie – White Plains, NY","Wrigley – Minneapolis, MN","Ana – Decatur, GA"}

I'm trying to extract the names, ideally separated by commas:

Charlie, Wrigley, Ana

I've gotten this far:

SELECT SUBSTRING(CAST(name_loc AS VARCHAR) from '"([^ –]+)')
FROM table;

which returns

Charlie

How can I extend this query to extract all names?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
jmintz4
  • 13
  • 3

3 Answers3

1

You can do this with a combination of regexp_matches (to extract the names), array_agg (to regroup all matches in a row) and array_to_string (to format the array as you'd like, e.g. with a comma separator):

WITH input(name_loc) AS (
    VALUES ('{"Charlie – White Plains, NY","Wrigley – Minneapolis, MN","Ana – Decatur, GA"}')
         , ('{"Other - somewhere}') -- added this to show multiple rows do not get merged
)
SELECT array_to_string(names, ', ')
FROM input
CROSS JOIN LATERAL (
    SELECT array_agg(name)
    FROM regexp_matches(name_loc, '"(\w+)', 'g') AS f(name)
) AS f(names);
array_to_string
Charlie, Wrigley, Ana
Other

View on DB Fiddle

Marth
  • 23,920
  • 3
  • 60
  • 72
  • 1
    Nice, you fiddle is empty though. – JvdV Aug 21 '22 at 14:46
  • 1
    Oops, looks like the "copy as markdown" feature adds a link to no fiddle if you didn't save it first. Thanks, I updated it. – Marth Aug 21 '22 at 14:52
  • 2
    Works now! Great. I also added an answer where I copy pasted your first few lines. As I'm rather new to postgreSQL I hope you don't mind. ALso, your 2nd line is missing an double quote =) – JvdV Aug 21 '22 at 15:46
0

My two cents, though I'm rather new to postgreSQL and I had to copy the 1st piece from @Marth's his answer:

WITH input(name_loc) AS (
    VALUES ('{"Charlie – White Plains, NY","Wrigley – Minneapolis, MN","Ana – Decatur, GA"}')
         , ('{"Other - somewhere"}') 
)
SELECT REGEXP_REPLACE(name_loc, '{?(,)?"(\w+)[^"]+"}?','\1\2', 'g') FROM input;
regexp_replace
Charlie,Wrigley,Ana
Other
JvdV
  • 70,606
  • 8
  • 39
  • 70
0

Your string literal happens to be a valid array literal.
(Maybe not by coincidence? And the column should be type text[] to begin with?)

If that's the reliable format, there is a safe and simple solution:

SELECT t.id, x.names
FROM   tbl t
CROSS  JOIN LATERAL (
   SELECT string_agg(split_part(elem, ' – ', 1), ', ') AS names
   FROM   unnest(t.name_loc::text[]) elem
   ) x;

Or:

SELECT id, string_agg(split_part(elem, ' – ', 1), ', ') AS names
FROM   (SELECT id, unnest(name_loc::text[]) AS elem FROM tbl) t 
GROUP  BY id;

db<>fiddle here

Steps

  1. Unnest the array with unnest() in a LATERAL CROSS JOIN, or directly in the SELECT list.

  2. Take the first part with split_part(). I chose ' – ' as delimiter, not just ' ', to allow for names with nested space like "Anne Nicole". See:

  3. Aggregate results with string_agg(). I added no particular order as you didn't specify one.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228