0

I'm required to split names to insert into the database. Some sample strings are given below.

Covey, Stephen J, Mr
Clinton, Hilary B,
Obama, Barack, Mr

I need to split the above string like follows.

First_Name       Middle_Initial      Last_NAME     Title
 Stephen               J               Covey         Mr
 Hilary                B               Clinton
 Barack                                Obama         Mr

I can achieve the above using java code. Is is possible to split those strings directly in a sql select.?

Please note that, Few names might not have title or middle initial. I need the output as given above.

Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
SqlNerd
  • 65
  • 1
  • 5
  • Why don't you just create a function based on your java code? – Christian Aug 18 '15 at 13:01
  • Yes. It has already been done. Trying to learn if that is possible using sql, so that i can do the changes in sql and measure performance – SqlNerd Aug 18 '15 at 13:03
  • @SqlNerd 8 hours back you posted same question like: http://stackoverflow.com/questions/32063884/how-do-i-split-a-single-row-into-multiple-rows-and-insert-into-a-table-in-oracle. – Tharunkumar Reddy Aug 18 '15 at 13:12
  • @Tarun That is a different questions and yeah it is for the same work. But, I need alternatives too. – SqlNerd Aug 18 '15 at 13:33
  • @SqlNerd Parsing a person's name is asked here periodically. Here is an example, http://stackoverflow.com/questions/371490/oracle-sql-parsing-a-name-string-and-converting-it-to-first-initial-last-nam – Patrick Bacon Aug 18 '15 at 13:52

3 Answers3

2

SQL Fiddle

Oracle 11g R2 Schema Setup:

CREATE TABLE Names ( Name ) AS
          SELECT 'Covey, Stephen J, Mr' FROM DUAL
UNION ALL SELECT 'Clinton, Hilary B,' FROM DUAL
UNION ALL SELECT 'Obama, Barack, Mr' FROM DUAL

Query 1:

SELECT REGEXP_SUBSTR( Name, '^(.*?),\s*(.*?)(\s+(\w))?,\s*(.*)$', 1, 1, NULL, 1 ) AS Last_Name,
       REGEXP_SUBSTR( Name, '^(.*?),\s*(.*?)(\s+(\w))?,\s*(.*)$', 1, 1, NULL, 2 ) AS First_Name,
       REGEXP_SUBSTR( Name, '^(.*?),\s*(.*?)(\s+(\w))?,\s*(.*)$', 1, 1, NULL, 4 ) AS Middle_Initial,
       REGEXP_SUBSTR( Name, '^(.*?),\s*(.*?)(\s+(\w))?,\s*(.*)$', 1, 1, NULL, 5 ) AS Title
FROM   Names

Results:

| LAST_NAME | FIRST_NAME | MIDDLE_INITIAL |  TITLE |
|-----------|------------|----------------|--------|
|     Covey |    Stephen |              J |     Mr |
|   Clinton |     Hilary |              B | (null) |
|     Obama |     Barack |         (null) |     Mr |

Query 2:

SELECT REGEXP_REPLACE( Name, '^(.*?),\s*(.*?)(\s+(\w))?,\s*(.*)$', '\1' ) AS Last_Name,
       REGEXP_REPLACE( Name, '^(.*?),\s*(.*?)(\s+(\w))?,\s*(.*)$', '\2' ) AS First_Name,
       REGEXP_REPLACE( Name, '^(.*?),\s*(.*?)(\s+(\w))?,\s*(.*)$', '\4' ) AS Middle_Initial,
       REGEXP_REPLACE( Name, '^(.*?),\s*(.*?)(\s+(\w))?,\s*(.*)$', '\5' ) AS Title
FROM   Names

Results:

| LAST_NAME | FIRST_NAME | MIDDLE_INITIAL |  TITLE |
|-----------|------------|----------------|--------|
|     Covey |    Stephen |              J |     Mr |
|   Clinton |     Hilary |              B | (null) |
|     Obama |     Barack |         (null) |     Mr |

Query 3:

WITH Split_Names AS (
  SELECT REGEXP_SUBSTR( Name, '^[^,]+' ) AS Last_Name,
         REGEXP_REPLACE( Name, '^.*?,\s*|\s*,.*?$' ) AS Given_Names,
         REGEXP_SUBSTR( Name, '[^\s,]+$' ) AS Title
  FROM   Names
)
SELECT Last_Name,
       REGEXP_REPLACE( Given_Names, '\s+\w$' ) AS First_Name,
       TRIM( REGEXP_SUBSTR( Given_Names, '\s+\w$' ) ) AS Middle_Initial,
       Title
FROM   Split_Names

Results:

| LAST_NAME | FIRST_NAME | MIDDLE_INITIAL |  TITLE |
|-----------|------------|----------------|--------|
|     Covey |    Stephen |              J |     Mr |
|   Clinton |     Hilary |              B | (null) |
|     Obama |     Barack |         (null) |     Mr |
MT0
  • 143,790
  • 11
  • 59
  • 117
  • `REGEXP_SUBSTR` was expanded in Oracle 11.1 ([documentation here](http://docs.oracle.com/cd/B28359_01/server.111/b28286/functions138.htm)) to take a 6th argument which returns the matching sub-expression (capture group). This will only work in Oracle 11 or later versions - but the question wasn't tagged with a particular version. – MT0 Aug 18 '15 at 17:31
  • Added two more versions that should work in Oracle 10. – MT0 Aug 18 '15 at 18:36
  • @SqlNerd: make sure you test with a data sample from your data and include the unexpected too like null elements, names with spaces, single quotes, hyphens in them, etc. – Gary_W Aug 18 '15 at 18:43
  • @Gary_W All those things are handled by all of my solutions above [SQLFIDDLE](http://sqlfiddle.com/#!4/a32e0/1) – MT0 Aug 18 '15 at 22:06
  • This will fail when the name changes as Covey, Mr., Stephen J right? – SqlNerd Aug 20 '15 at 11:03
  • It would ouput `LAST_NAME = Covey`, `FIRST_NAME = Mr.`, `MIDDLE_INITIAL = null`, `TITLE = Stephen J` (with the fields in the wrong order) - which is to be expected since the input data is in the wrong order. All the solutions listed here will have that issue - you will need to do some sanity checking of the data coming in (or even better, get the users to input each field separately). – MT0 Aug 20 '15 at 11:09
1

Use regexp_substr (DB >=10g):

SELECT TRIM( REGEXP_SUBSTR(input_row, '[^,]+', 1, 1)) AS Last_NAME,
TRIM( REGEXP_SUBSTR( REGEXP_SUBSTR(input_row, '[^,]+', 1, 2), '[^ ]+', 1, 1)) AS First_Name,
TRIM( REGEXP_SUBSTR( REGEXP_SUBSTR(input_row, '[^,]+', 1, 2), '[^ ]+', 1, 2)) AS Middle_Initial,
TRIM( REGEXP_SUBSTR(input_row, '[^,]+', 1, 3)) AS Title
FROM source_table;
Alexander Ushakov
  • 5,139
  • 3
  • 27
  • 50
  • 1
    Warning! The regex of the format '[^,]+' commonly used for parsing delimited strings fails when there are NULL elements in the list. Make sure you test thoroughly. Always expect the unexpected! See here for more details and an improved regex: https://stackoverflow.com/questions/31464275/split-comma-separated-values-to-columns-in-oracle/31464699#31464699 – Gary_W Aug 18 '15 at 14:00
  • Perfect.!! Thanks. :) – SqlNerd Aug 18 '15 at 14:05
  • @Gary_W are you trying to say, if the input_row is null i.e. if the input string is null, the regex will fail? – SqlNerd Aug 18 '15 at 14:06
  • No, if an element in that list is null, and you are after that value or one after it (see the link I provided for more info). i.e. Try this as input data: `'Clinton,,Ms'`. The first name will end up being 'Ms'. It will return the wrong data which is worse than it just failing! – Gary_W Aug 18 '15 at 14:14
  • This will also fail to properly parse a two part first name such as Miss Mary Kate Anne De Laney where the last name is De Laney (with an embedded space), Mary Kate is the first name (also with an embedded space), and Anne is the middle name. This would likely be formated as De Laney, Mary Kate A, Miss – Sentinel Aug 18 '15 at 15:04
1

This is Alexander's answer modified with an improved regex that handles NULL list elements. Oh and instead of repeating that regex, make it reusable by creating a function as described here: REGEX to select nth value from a list, allowing for nulls, then call that. That way the code is encapsulated and reusable by all with only one place to change code if you have to:

SQL> with tbl(input_row) as (
   select 'Covey, Stephen J, Mr' from dual
   union
   select 'Clinton,,Ms' from dual
   union
   select 'Obama, Barack, Mr' from dual
   )
   SELECT TRIM( REGEXP_SUBSTR(input_row, '([^,]*)(,|$)', 1, 1, NULL, 1)) AS Last_NAME,
   TRIM( REGEXP_SUBSTR( REGEXP_SUBSTR(input_row, '([^,]*)(,|$)', 1, 2, NULL, 1), '[^ ]+', 1, 1)) AS First_Name,
   TRIM( REGEXP_SUBSTR( REGEXP_SUBSTR(input_row, '([^,]*)(,|$)', 1, 2, NULL, 1), '[^ ]+', 1, 2)) AS Middle_Initial,
   TRIM( REGEXP_SUBSTR(input_row, '([^,]*)(,|$)', 1, 3, NULL, 1)) AS Title
   FROM tbl;

LAST_NAME            FIRST_NAME           MIDDLE_INITIAL       TITLE
-------------------- -------------------- -------------------- --------------------
Clinton                                                        Ms
Covey                Stephen              J                    Mr
Obama                Barack                                    Mr

SQL>
Gary_W
  • 9,933
  • 1
  • 22
  • 40
  • MT0's is the best answer. It handles embedded spaces in last_names, first_names and NULL list elements and uses 2 less calls to regexp_substr. – Gary_W Aug 18 '15 at 15:49