2

ACT and CAT are anagrams

I have to Write a function in sql server that takes 2 strings and given a Boolean output that indicates whether the both of them are anagram or not.

This doesnt make sense to do it in sql server,but,it is for learning purpose only

BlindCat
  • 295
  • 1
  • 2
  • 8
  • 6
    This site is not designed to make your homework. Please post at least a try you already did by yourself. – SQL Police Aug 05 '16 at 10:00
  • The two strings should have the same length for a start. If they do, you'll have to split the strings into individual letters. Probably best to put the letters in tables. Then you have to check they have the same letters, and the same number per letter. – HoneyBadger Aug 05 '16 at 10:01

5 Answers5

8

SQL Server is not good at this kind of things, but here you are:

WITH Src AS
(
    SELECT * FROM (VALUES
    ('CAT', 'ACT'),
    ('CAR', 'RAC'),
    ('BUZ', 'BUS'),
    ('FUZZY', 'MUZZY'),
    ('PACK', 'PACKS'),
    ('AA', 'AA'),
    ('ABCDEFG', 'GFEDCBA')) T(W1, W2)
), Numbered AS
(
    SELECT *, ROW_NUMBER() OVER (ORDER BY (SELECT 1)) Num
    FROM Src
), Splitted AS
(
    SELECT Num, W1 Word1, W2 Word2, LEFT(W1, 1) L1, LEFT(W2, 1) L2, SUBSTRING(W1, 2, LEN(W1)) W1, SUBSTRING(W2, 2, LEN(W2)) W2
    FROM Numbered
    UNION ALL
    SELECT Num, Word1, Word2, LEFT(W1, 1) L1, LEFT(W2, 1) L2, SUBSTRING(W1, 2, LEN(W1)) W1, SUBSTRING(W2, 2, LEN(W2)) W2
    FROM Splitted
    WHERE LEN(W1)>0 AND LEN(W2)>0
), SplitOrdered AS
(
    SELECT *,
        ROW_NUMBER() OVER (PARTITION BY Num ORDER BY L1) LNum1,
        ROW_NUMBER() OVER (PARTITION BY Num ORDER BY L2) LNum2
    FROM Splitted
)
SELECT S1.Num, S1.Word1, S1.Word2, CASE WHEN COUNT(*)=LEN(S1.Word1) AND COUNT(*)=LEN(S1.Word2) THEN 1 ELSE 0 END Test
FROM SplitOrdered S1
JOIN SplitOrdered S2 ON S1.L1=S2.L2 AND S1.Num=S2.Num AND S1.LNum1=S2.LNum2
GROUP BY S1.Num, S1.Word1, S1.Word2

And results:

1   CAT     ACT     1
2   CAR     RAC     1
3   BUZ     BUS     0
4   FUZZY   MUZZY   0
5   PACK    PACKS   0
6   AA      AA      1
7   ABCDEFG GFEDCBA 1
Paweł Dyl
  • 8,888
  • 1
  • 11
  • 27
  • Wow, very nice query! Tried it, works super, also with duplicate characters (see my other comment). BTW, also learned something new, didn't know about the `values` clause in `select`. – SQL Police Aug 05 '16 at 10:51
  • 1
    @Tanner Yeah, but this is also a Q&A site which gives benefits to other users. From this view, this is a perfectly ready query which can be used by copy-paste, and also you can benefit by analyzing it, and therefore, it should be **upvoted!** – SQL Police Aug 05 '16 at 10:55
  • I am sorry, but while this is a cool use of CTE this query runs rather slow (on my server at least). 160 000 rows in 24 sec.. I created a function with a loop, and it dose the job in 0.something sec. Maybe someone else also can test/confirm this? – Jarle Bjørnbeth Aug 05 '16 at 11:06
4

First split (T-SQL Split Word into characters) both words into temporary tables. Then perform an outer join and check for nulls.

Edit thanks to George's comment:

  1. split (T-SQL Split Word into characters) both words into temporary tables
  2. Modify temporary tables or use CTEs to add a column with count(*) with group by letters clause
  3. Perform a full outer join on two temporary tables using a letter and it's count in join condition
  4. Check for nulls in the output - if there are none, you have an anagram
Community
  • 1
  • 1
PacoDePaco
  • 689
  • 5
  • 16
  • Be careful with duplicate characters. For example, the words `ABCC` and `ABBC` could show up as anagram, while they are not. – SQL Police Aug 05 '16 at 10:10
  • 1
    True. Another column storing counts of each letter in both tables and join condition should fix it. – PacoDePaco Aug 05 '16 at 10:12
2

The first get in my mind:

DECLARE @word1 nvarchar(max) = NULL,
        @word2 nvarchar(max) = 'Test 1',
        @i int = 0, @n int

DECLARE @table TABLE (
    id int,
    letter int
)

SELECT @word1 = ISNULL(LOWER(@word1),''), @word2 = ISNULL(LOWER(@word2),'')

SELECT @n = CASE WHEN LEN(@word1) > LEN(@word2) THEN LEN(@word1) ELSE LEN(@word2) END

WHILE @n > 0
BEGIN
    INSERT INTO @table
    SELECT 1, ASCII(SUBSTRING(@word1,@n,1))
    UNION ALL
    SELECT 2, ASCII(SUBSTRING(@word2,@n,1))
    SET @n=@n-1
END

SELECT CASE WHEN COUNT(*) = 0 THEN 1 ELSE 0 END isAnagram
FROM (
    SELECT id, letter, COUNT(letter) as c
    FROM @table
    WHERE id = 1
    GROUP BY id, letter)as t
FULL OUTER JOIN (
    SELECT id, letter, COUNT(letter) as c
    FROM @table
    WHERE id = 2
    GROUP BY id, letter) as p
    ON  t.letter = p.letter and t.c =p.c
WHERE t.letter is NULL OR p.letter is null

Output:

isAnagram
0
gofr1
  • 15,741
  • 11
  • 42
  • 52
  • Ah, add count to fight with duplicate letters :) – gofr1 Aug 05 '16 at 11:23
  • Thanks for checking :) I used few real words and there were ok (of course) – gofr1 Aug 05 '16 at 11:31
  • Check out "Test1" vs "NULL" AND "Test1Test2" vs "Test1 Test2", they will return "1". – Jarle Bjørnbeth Aug 05 '16 at 12:03
  • @JarleBjørnbeth this will check only words (in question there is no word about phrase anagrams), NULLs and another stuff could be easily checked with ISNULL. It is a sample of query, if you want it on production - please add all restrictions you need. :) – gofr1 Aug 05 '16 at 12:08
  • @gofr1 I think it is at least worth noting, it might not be the behavior someone expected. Especially if they are not as experienced with SQL :) – Jarle Bjørnbeth Aug 05 '16 at 12:17
  • 1
    @JarleBjørnbeth Thanks, I heard you and add some checks :) Now it is insensitive to lower/upper cases if someone is interested! – gofr1 Aug 05 '16 at 12:27
  • 1
    @gofr1 Hehe, nice touch :) – Jarle Bjørnbeth Aug 05 '16 at 12:29
2

You can also use loops in functions, and they can work fast. I am not able to get any of the of other answers even close to the performance of this function:

CREATE FUNCTION IsAnagram 
(
    @value1 VARCHAR(255)
    , @value2 VARCHAR(255)
) 
RETURNS BIT
BEGIN

    IF(LEN(@value1) != LEN(@value2))
        RETURN 0;

    DECLARE @firstChar VARCHAR(3);

    WHILE (LEN(@value1) > 0)
    BEGIN
        SET @firstChar = CONCAT('%', LEFT(@value1, 1), '%');

        IF(PATINDEX(@firstChar, @value2) > 0)
            SET @value2 = STUFF(@value2, PATINDEX(@firstChar, @value2), 1, '');
        ELSE
            RETURN 0;

        SET @value1 = STUFF(@value1, 1, 1, '');

    END

    RETURN (SELECT IIF(@value2 = '', 1, 0));

END

GO

SELECT dbo.IsAnagram('asd', 'asd')
--1
SELECT dbo.IsAnagram('asd', 'dsa')
--1
SELECT dbo.IsAnagram('assd', 'dsa')
--0
SELECT dbo.IsAnagram('asd', 'dssa')
--0
SELECT dbo.IsAnagram('asd', 'asd')
2

This is something a numbers table can help with.

Code to create and populate a small numbers table is below.

CREATE TABLE dbo.Numbers
  (
     Number INT PRIMARY KEY
  );

WITH Ten(N) AS 
(
    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL 
    SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
)   
INSERT INTO dbo.Numbers
SELECT ROW_NUMBER() OVER (ORDER BY @@SPID) AS Number
FROM   Ten T10,
       Ten T100,
       Ten T1000

Once that is in place you can use

SELECT W1,
       W2,
       IsAnagram = CASE
                     WHEN LEN(W1) <> LEN(W2)
                       THEN 0
                     ELSE
                       CASE
                         WHEN EXISTS (SELECT SUBSTRING(W1, Number, 1),
                                             COUNT(*)
                                      FROM   dbo.Numbers
                                      WHERE  Number <= LEN(W1)
                                      GROUP  BY SUBSTRING(W1, Number, 1)
                                      EXCEPT
                                      SELECT SUBSTRING(W2, Number, 1),
                                             COUNT(*)
                                      FROM   dbo.Numbers
                                      WHERE  Number <= LEN(W2)
                                      GROUP  BY SUBSTRING(W2, Number, 1))
                           THEN 0
                         ELSE 1
                       END
                   END 
FROM  (VALUES
        ('CAT', 'ACT'),
        ('CAR', 'RAC'),
        ('BUZ', 'BUS'),
        ('FUZZY', 'MUZZY'),
        ('PACK', 'PACKS'),
        ('AA', 'AA'),
        ('ABCDEFG', 'GFEDCBA')) T(W1, W2)

Or an alternative implementation could be

   IsAnagram = CASE
                 WHEN LEN(W1) <> LEN(W2)
                   THEN 0
                 ELSE
                   CASE
                     WHEN EXISTS (SELECT 1
                                  FROM   dbo.Numbers N
                                         CROSS APPLY (VALUES(1,W1),
                                                            (2,W2)) V(Col, String)
                                  WHERE  N.Number <= LEN(W1)
                                  GROUP  BY SUBSTRING(String, Number, 1)
                                  HAVING COUNT(CASE WHEN Col = 1 THEN 1 END) <> 
                                         COUNT(CASE WHEN Col = 2 THEN 1 END))
                       THEN 0
                     ELSE 1
                   END
               END 
Martin Smith
  • 438,706
  • 87
  • 741
  • 845