1

I have to cleanup a column with Companies names by removing Inc, Ltd, &Co, Co, Corp, . , $, &, etc, and the list can be updated later on.

In Sql server 2016 I used Replace but it will replace everywhere those letters not only when they form a specific word, ex: name Co.

alter table [CompanyList] add CleanLegalName as 
    cast (Rtrim( Replace (Replace (Replace (Replace (Replace (Replace (Replace (Replace (REPLACE
        ([Legal Name], ' INC', ''), '.', ''), ' LTD', ''), ' Inc', ''), ' Ltd', ''), ' LIMITED', ''), ' INCOPORATED' ,'') , ',' , '') , ' CO', '')
      ) as varchar(200))

The problem is Replace will replace everywhere these letters, like: 'Jane Construction' with 'Jane nstruction' , 'Inca Food' with 'a Food'. How can I remove these letters only when they form a word by themselves and not as part of other word ? Thank you

sticky bit
  • 36,626
  • 12
  • 31
  • 42
Dmana
  • 25
  • 4

1 Answers1

0

Let's say that we have the following scenario

    CREATE TABLE #Temp([Legal Name] NVARCHAR(MAX))

    INSERT INTO #Temp ([Legal Name]) VALUES ('Beer Acme Co.')
    INSERT INTO #Temp ([Legal Name]) VALUES ('Company of Brothers Co')
    INSERT INTO #Temp ([Legal Name]) VALUES ('My Star Inc.')
    INSERT INTO #Temp ([Legal Name]) VALUES ('Incredible Monsters INC')
    INSERT INTO #Temp ([Legal Name]) VALUES ('Limit Is The Sky Ltd')
    INSERT INTO #Temp ([Legal Name]) VALUES ('Limit Is The Universe Ltd.')
    INSERT INTO #Temp ([Legal Name]) VALUES ('Unlimited Minds Limited.')
    INSERT INTO #Temp ([Legal Name]) VALUES ('Unlimited Borders Limited')

I can propose two ways to try help you.

Option 1

Via Scalar-Valued Function

    CREATE FUNCTION [dbo].[GetClearedName](@VALUE NVARCHAR(MAX))
    RETURNS NVARCHAR(MAX)
    AS
        BEGIN

            DECLARE @PATTERN NVARCHAR(MAX)
            DECLARE @TEMP TABLE ([PATTERN] NVARCHAR(MAX))

            INSERT INTO @TEMP ([PATTERN]) VALUES ('%Co.')
            INSERT INTO @TEMP ([PATTERN]) VALUES ('% Co%')      
            INSERT INTO @TEMP ([PATTERN]) VALUES ('%Inc.')
            INSERT INTO @TEMP ([PATTERN]) VALUES ('% Inc')
            INSERT INTO @TEMP ([PATTERN]) VALUES ('% Ltd')
            INSERT INTO @TEMP ([PATTERN]) VALUES ('%Ltd.')
            INSERT INTO @TEMP ([PATTERN]) VALUES ('%Limited.')
            INSERT INTO @TEMP ([PATTERN]) VALUES ('% Limited')

            DECLARE @RESULT NVARCHAR(MAX)

            SET @RESULT = @VALUE

            DECLARE PATTERN_CURSOR CURSOR FOR SELECT [PATTERN] FROM @TEMP

            OPEN PATTERN_CURSOR
            FETCH NEXT FROM PATTERN_CURSOR INTO @PATTERN

            WHILE (@@FETCH_STATUS = 0)
                BEGIN


                    IF (PATINDEX(@PATTERN, @RESULT) > 0)
                        BEGIN
                            SET @RESULT = SUBSTRING(@RESULT, 0, PATINDEX(@PATTERN, @RESULT))                
                        END

                    FETCH NEXT FROM PATTERN_CURSOR INTO @PATTERN
                END

            CLOSE PATTERN_CURSOR
            DEALLOCATE PATTERN_CURSOR

            IF (LEN(@RESULT) <> 0)
                RETURN @RESULT 

            Return @VALUE
        END

You can use the function that way:

    SELECT [dbo].[GetClearedName]([Legal Name]) FROM #Temp

    DROP TABLE #Temp

Option 2

Using #SQL, also this can be converted into a function.

    SELECT [Legal Name],
        SQL#.RegEx_Replace4k(
            SQL#.RegEx_Replace4k(
                SQL#.RegEx_Replace4k( 
                    SQL#.RegEx_Replace4k([Legal Name], N'(CO+\.|\sCO$)', N' ', -1, 1, 'IgnoreCase'), 
                                                            N'(INC+\.|\sINC$)', N' ', -1, 1, 'IgnoreCase'), 
                                                                    N'(LTD+\.|\sLTD$)', N' ', -1, 1, 'IgnoreCase'),
                                                                        N'(Limited+\.|\sLimited$)', N' ', -1, 1, 'IgnoreCase')
        As [Cleared Name] 
    FROM 
        #Temp

    DROP TABLE #Temp

Expected results

expected results

Flavio Francisco
  • 755
  • 1
  • 8
  • 21