Remove Duplicate adjacent Sub-String from String in Microsoft SQL Server

Question

I am using SQL Server 2008 and I have a column in a table, which has values like below. It basically shows departure and arrival information.

-->Heathrow/Dublin*Dublin/Heathrow

-->Gatwick/Liverpool*Liverpool/Carlisle *Carlisle/Gatwick

-->Heathrow/Dublin*Liverpool/Heathrow

(The 3rd example shown above is slightly different where the person did not depart from Dublin, instead departed from a Liverpool).

This makes the column too lengthy, and I want to remove only the adjacent duplicates, so the information can be shown like below:

-->Heathrow/Dublin/Heathrow

-->Gatwick/Liverpool/Carlisle/Gatwick

-->Heathrow/Dublin***Liverpool/Heathrow

So, this would still show the correct travel route, but omits only the contiguous duplicates. Also, in the 3rd case, since the departure and arrival information location is not the same, Iwould like to show it as ***.

I found a post here that removes all duplicates (Find and Remove Repeated Substrings) but this is slightly different from the solution that I need.

Could someone share any thoughts please?

score 0 · Answer 1 · edited May 23 '17 at 12:12

The first step is to adapt the process defined in the following link so that it splits based on /:

T-SQL split string

This returns a table which you would then loop through checking if the value contains an *. In that case you would get the text values before and after the * and compare them. Use CHARINDEX to get the position of the *, and SUBSTRING to get the values before and after. Once you have those check both values and append to your output string accordingly.

score 0 · Answer 2 · answered Jan 29 '15 at 21:37

So you have a database column that contains this text string? Is your concern to display the data to the user in a new format, or to update the data in your database table with a new value?

Do you have access to the original data from which this text string was built? It would probably be easier to re-create the string in the format you desire than it would be to edit the existing string programmatically.

If you don't have access to this data, it would probably be a lot simpler to update your data (or reformat it for display) if you do the string manipulation in a high-level language such as c# or java.

If you're reformatting it for display, write the string manipulation code in whatever language is appropriate, right before displaying it. If you're updating your table, you could write a program to process the table, reading each record, building the replacement string, and updating the record before moving on to the next one.

The bottom line is that T-SQL is just not a good language for doing this sort of string examination and manipulation. If you can build a fresh string from the original data, or do your manipulation in a high-level language, you'll have an easier job of it and end up with more maintainable code.

Dudi Konfino · Answer 3 · 2015-02-14T18:46:54.670

I wrote a code for the first example you gave. You still need to improve it for the rest ...

DECLARE @STR VARCHAR(50)='Heathrow/Dublin*Dublin/Heathrow'

IF  (SELECT SUBSTRING(@STR,CHARINDEX('/',@STR)+1,CHARINDEX('*',@STR)-CHARINDEX('/',@STR)-1)) =

   (SELECT SUBSTRING(@STR,CHARINDEX('*',@STR)+1,LEN(SUBSTRING(@STR,CHARINDEX('/',@STR)+1,CHARINDEX('*',@STR)-CHARINDEX('/',@STR)-1))))  

   BEGIN
    SELECT STUFF(@STR,CHARINDEX('*',@STR),LEN(SUBSTRING(@STR,CHARINDEX('/',@STR)+1,CHARINDEX('*',@STR)-CHARINDEX('/',@STR)-1))+1,'')
END
ELSE
BEGIN
SELECT STUFF(@STR,CHARINDEX('*',@STR),LEN(SUBSTRING(@STR,CHARINDEX('*',@STR)+1,LEN(SUBSTRING(@STR,CHARINDEX('/',@STR)+1,CHARINDEX('*',@STR)-CHARINDEX('/',@STR)-1)))),'***')    
END

Remove Duplicate adjacent Sub-String from String in Microsoft SQL Server

3 Answers3