4

I'm trying to clean up data in a "Name" field in our database and split that data into FirstName, MiddleName, & LastName. Currently, I'm using a Case statement to look for various triggers inside the text to help me format the output in a certain way.

However, I'm starting to notice that I'm nesting tests inside other tests and need to figure out how to process data recursively. See this example of how I'm extracting the FirstName.

Case 
  When CharIndex(' ',LTrim(RTrim(Name))) in (0,1) Then '' --'empty or LName'
  When Left(Name,3) IN ('MR ','MS ', 'DR ','MRS') Then    --'Prefix Titles'
    Case --'If we found a prefix, run the same "tests" with the prefix removed'
      When CharIndex(' ',LTrim(RTrim(Right(Name,Len(Name)-CharIndex(' ',Name)))))
        in (0,1) Then '' 
      When SubString(LTrim(RTrim(Right(Name,Len(Name)-CharIndex(' ',Name)))),3,1) 
        = '&' Then SubString(LTrim(RTrim(Right(Name,Len(Name)-CharIndex(' ',
        Name)))),1,5)
      Else Left(LTrim(RTrim(Right(Name,Len(Name)-CharIndex(' ',Name)))),
        CHarIndex(' ',LTrim(RTrim(Right(Name,Len(Name)-CharIndex(' ',Name)))))-1) 
    End
  When SubString(LTrim(RTrim(Name)),3,1) = '&' Then 
    SubString(LTrim(RTrim(Name)),1,5) --'Look for initials e.g. J & A Smith'
  Else Left(LTrim(RTrim(Name)),CHarIndex(' ',LTrim(RTrim(Name)))-1) 
  End

So, in order to get this to work in more complicated situations (e.g. MR JOHN A SMITH JR), I would need to test recursively. In imperative programming, I would do something like this if I had a function called GetFirstName:

GetFirstName('MR JOHN A SMITH JR')
//GetFirstName identfies 'MR' and within the function it calls:
||
\\
   ==> GetFirstName('JOHN A SMITH JR')
       //GetFirstName identifies 'JR' and within the function it calls:
       ||
       \\
          ==> GetFirstName('JOHN A SMITH')
              //Finally, it returns 'JOHN'

Ideally, it would be great to do this in straight SQL, but I'm not sure that is possible. What alternatives would I have if I'm not using straight SQL? (I'm using SQL Server 2005)

Ben McCormack
  • 32,086
  • 48
  • 148
  • 223

2 Answers2

1

I don't think it is something easy/clean to accomplish in straight SQL. You can use regular expressions though however you have to write your own CLR function to provide regex functionality.

Umut Utkan
  • 648
  • 4
  • 11
0

This sounds like a one-time activity. Are you sure you can't do this in several statements using temp tables? If it is a one-time activity, then likely you have a higher need for accuracy and simplicity of code debugging than performance.

Consider something like this:

CREATE TABLE #MyNames (
  PersonID INT PRIMARY KEY,
  OriginalName VARCHAR(50),
  WorkingName VARCHAR(50),
  CandidateTitle VARCHAR(10),
  CandidateLastName VARCHAR(50),
  CandidateFirstName VARCHAR(50),
  CandidateMiddleName VARCHAR(50)
  --Your other candidate fields.....
)

INSERT INTO #MyNames (PersonID, OriginalName)
 SELECT TOP 100 ID, LTRIM(RTRIM(Name)) from SourcePersonTable

--Possibly add some indexes here for original name

UPDATE #MyNames
 SET CandidateTitle = LEFT(OriginalName,3),
     WorkingName = SUBSTRING(OriginalName,4,9999)
Where LEFT(OriginalName,3) IN
    ('MR ','MRS','MS ','DR ')

-- etc...

Just keep adding steps and editing the [WorkingName] field. When you're done with one pass, just do...

UPDATE #MyNames SET WorkingName = OriginalName

...and you're ready to do another sweep.

NYCdotNet
  • 4,500
  • 1
  • 25
  • 27
  • Thanks for the suggestion. I suppose using temp tables is a possibility, but it forces me to write code multiple times. Because the contents of the name field could trigger multiple "rules" at the same time, it seems to me that a recursive implementation would be advantageous. – Ben McCormack Jan 08 '11 at 16:05