Finding strings with duplicate letters inside

Question

Can somebody help me with this little task? What I need is a stored procedure that can find duplicate letters (in a row) in a string from a table "a" and after that make a new table "b" with just the id of the string that has a duplicate letter.

Something like this:

Table A

ID Name   
1  Matt
2  Daave
3  Toom
4  Mike
5  Eddie

And from that table I can see that Daave, Toom, Eddie have duplicate letters in a row and I would like to make a new table and list their ID's only. Something like:

Table B

Only 2,3,5 because that is the ID of the string that has duplicate letters in their names.

I hope this is understandable and would be very grateful for any help.

Matt also has a pair of letters. What SQL engine are you using? — Gordon Linoff, Jan 29 '13 at 03:08
Yea my mistake, Matt should be included also :D I am using sql server — gocman, Jan 29 '13 at 03:47

veljasije · Answer 1 · 2013-01-29T22:19:37.040

4

In your answer with stored procedure, you have 2 mistakes, one is missing space between column name and LIKE clause, second is missing single quotes around search parameter.

I first create user-defined scalar function which return 1 if string contains duplicate letters:

EDITED

CREATE FUNCTION FindDuplicateLetters
(
    @String NVARCHAR(50)
)
RETURNS BIT
AS
BEGIN

    DECLARE @Result BIT = 0 
    DECLARE @Counter INT = 1

    WHILE (@Counter <= LEN(@String) - 1) 
    BEGIN


    IF(ASCII((SELECT SUBSTRING(@String, @Counter, 1))) = ASCII((SELECT SUBSTRING(@String, @Counter + 1, 1))))
        BEGIN
             SET @Result = 1
             BREAK
        END


        SET @Counter = @Counter + 1 
    END

    RETURN @Result

END
GO

After function was created, just call it from simple SELECT query like following:

SELECT 
    * 
FROM
    (SELECT 
        *, 
        dbo.FindDuplicateLetters(ColumnName) AS Duplicates
    FROM TableName) AS a
WHERE a.Duplicates = 1

With this combination, you will get just rows that has duplicate letters.

edited Jan 29 '13 at 22:19

answered Jan 29 '13 at 15:00

veljasije

6,722
12
48
79

I tried that it works :). Ty, i am newbie in sql server and this was very helpful. Now it works as it should be, thank you all for help. – gocman Jan 29 '13 at 15:17
@user2020224 Try function above, it will work with all letters, beside collations, because it compare ascii codes. It is up to you to define table and column on which you want to work, but instead you don't have to specify letter combination. If this is helpful, mark it as answer. – veljasije Jan 30 '13 at 13:57
This was great. I also needed to check for consecutive letters, so I edited to check: IF(ASCII((SELECT SUBSTRING((at)String, (at)Counter, 1))) = ASCII((SELECT SUBSTRING((at)String, (at)Counter + 1, 1)))-1) – Mike Jan 04 '22 at 22:48

Gordon Linoff · Answer 2 · 2013-01-29T14:03:11.833

3

In any version of SQL, you can do this with a brute force approach:

select *
from t
where t.name like '%aa%' or
      t.name like '%bb%' or
      . . .
      t.name like '%zz%'

If you have a case sensitive collation, then use:

where lower(t.name) like '%aa%' or
      . . .

edited Jan 29 '13 at 14:03

answered Jan 29 '13 at 03:15

Gordon Linoff

1,242,037
58
646
786

+1 I'm sure this will work just fine, but what happens when you encounter `Mr. Töötles`? Not to suggest it wouldn't work, SQL Server might normalise the `ö` to `o` considering `'%aa%'` etc are `VARCHAR`. – ta.speot.is Jan 29 '13 at 05:59

score 2 · Answer 3 · answered Mar 02 '18 at 20:12

Here's one way.

First create a table of numbers

CREATE TABLE dbo.Numbers
  (
     number INT PRIMARY KEY
  );

INSERT INTO dbo.Numbers
SELECT number
FROM   master..spt_values
WHERE  type = 'P'
       AND number > 0;

Then with that in place you can use

SELECT *
FROM   TableA
WHERE  EXISTS (SELECT *
               FROM   dbo.Numbers
               WHERE  number < LEN(Name)
                      AND SUBSTRING(Name, number, 1) = SUBSTRING(Name, number + 1, 1))

score 1 · Answer 4 · answered Aug 02 '17 at 21:40

Though this is an old post it's worth posting a solution that will be faster than a brute force approach or one that uses a scalar udf (which generally drag down performance). Using NGrams8K this is rather simple.

--sample data
declare @table table (id int identity primary key, [name] varchar(20));
insert @table([name]) values ('Mattaa'),('Daave'),('Toom'),('Mike'),('Eddie');

-- solution #1
select id
from @table
cross apply dbo.NGrams8k([name],1)
where charindex(replicate(token,2), [name]) > 0
group by id;

-- solution #2 (SQL 2012+ solution using LAG)
select id
from
(
  select id, token, prevToken = lag(token,1) over (partition by id order by position)
  from @table
  cross apply dbo.NGrams8k([name],1)
) prep
where token = prevToken
group by id; -- optional id you want to remove possible duplicates.

score 0 · Answer 5 · edited Jan 31 '23 at 07:12

0

another burte force way:

select *

from t

where t.name ~ '(.)\1';

edited Jan 31 '23 at 07:12

Ramil Aliyev 007

4,437
2
31
47

answered Jan 26 '23 at 15:45

Marcia Cripps

1
1

2

In SQL Server, which the OP finally stated he was using, you get the following when using your good code... Msg 102, Level 15, State 1, Line 15 Incorrect syntax near '~'. – Jeff Moden Jan 26 '23 at 22:50

Finding strings with duplicate letters inside

5 Answers5

Linked