0

I'm trying to solve the same problem as in this question, but this time in SQL Server 2014. I need to check if strings are made out of the same words:

Returns true for:

Antoine de Saint-Exupéry = de Saint-Exupéry Antoine = Saint-Exupéry Antoine de = etc.

and

Returns false for:

Antoine de Saint-Exupéry != Antoine de Saint != Antoine Antoine de Saint-Exupéry != etc.

What are my options in SQL Server 2014? Is there a built-in function for such comparison?

Community
  • 1
  • 1
jahu
  • 5,427
  • 3
  • 37
  • 64

2 Answers2

2

To compare 2 strings, one could abuse use the sorting capability in XQuery.

Cast the string to an XML, sort the elements and then return a string without the tags.

For example:

DECLARE @Words1 NVARCHAR(MAX) = N'Antoine de Saint-Exupéry';
DECLARE @Words2 NVARCHAR(MAX) = N'Saint-Exupéry Antoine de';

DECLARE @SortedWords1 NVARCHAR(MAX) = cast('<x>'+replace(@Words1,' ','</x><x>')+'</x>' as XML).query('for $x in /x order by $x ascending return $x').value('.','nvarchar(max)');
DECLARE @SortedWords2 NVARCHAR(MAX) = cast('<x>'+replace(@Words2,' ','</x><x>')+'</x>' as XML).query('for $x in /x order by $x ascending return $x').value('.','nvarchar(max)');

DECLARE @SameWords BIT = (case 
                          when @SortedWords1 = @SortedWords2
                          then 1 
                          else 0 
                          end);


SELECT @SameWords as SameWords;

Returns:

SameWords
---------
True 
LukStorms
  • 28,916
  • 5
  • 31
  • 45
1

Here is one way you could roll your own for this. I am using the string splitter from Jeff Moden. You can find the original article here. http://www.sqlservercentral.com/articles/Tally+Table/72993/. If you don't like that splitter there are some other great versions here. https://sqlperformance.com/2012/07/t-sql-queries/split-strings. I like the one from Jeff Moden because unlike any of the other splitters you get the ItemNumber returned which in some cases is incredibly useful.

CREATE FUNCTION [dbo].[DelimitedSplit8K]
--===== Define I/O parameters
        (@pString VARCHAR(8000), @pDelimiter CHAR(1))
--WARNING!!! DO NOT USE MAX DATA-TYPES HERE!  IT WILL KILL PERFORMANCE!
RETURNS TABLE WITH SCHEMABINDING AS
 RETURN
--===== "Inline" CTE Driven "Tally Table" produces values from 1 up to 10,000...
     -- enough to cover VARCHAR(8000)
  WITH E1(N) AS (
                 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
                 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
                 SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
                ),                          --10E+1 or 10 rows
       E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
       E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
 cteTally(N) AS (--==== This provides the "base" CTE and limits the number of rows right up front
                     -- for both a performance gain and prevention of accidental "overruns"
                 SELECT TOP (ISNULL(DATALENGTH(@pString),0)) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
                ),
cteStart(N1) AS (--==== This returns N+1 (starting position of each "element" just once for each delimiter)
                 SELECT 1 UNION ALL
                 SELECT t.N+1 FROM cteTally t WHERE SUBSTRING(@pString,t.N,1) = @pDelimiter
                ),
cteLen(N1,L1) AS(--==== Return start and length (for use in substring)
                 SELECT s.N1,
                        ISNULL(NULLIF(CHARINDEX(@pDelimiter,@pString,s.N1),0)-s.N1,8000)
                   FROM cteStart s
                )
--===== Do the actual split. The ISNULL/NULLIF combo handles the length for the final element when no delimiter is found.
 SELECT ItemNumber = ROW_NUMBER() OVER(ORDER BY l.N1),
        Item       = SUBSTRING(@pString, l.N1, l.L1)
   FROM cteLen l
;

The basic concept here is that you have to split your strings into words and then do a comparison. I used a couple of ctes so it is move obvious the process of how this works. The following works for all of the examples you posted.

declare @Phrase1 nvarchar(100) = 'Antoine de Saint-Exupéry'
    , @Phrase2 nvarchar(100) = 'de Saint-Exupéry Antoine'
;

with Phrase1 as
(
    select * 
    from DelimitedSplit8K(@Phrase1, ' ')
)
, Phrase2 as
(
    select * 
    from DelimitedSplit8K(@Phrase2, ' ')
)

select PhrasesEqual = convert(bit, case when count(*) > 0 then 1 else 0 end)
from Phrase1 p1
full outer join Phrase2 p2 on p2.Item = p1.Item
where p1.Item is null
    or p2.Item is null
;
Sean Lange
  • 33,028
  • 3
  • 25
  • 40