3

I want to find out the longest sequence of letter in a string

e.g. in the word Honorificabcdwert , the output will be abcd.

How can I do it?

My idea is to get the Ascii and then count the sequence until it breaks at some point. But I was able to proceed with only

DECLARE @t TABLE(ID INT IDENTITY,String VARCHAR(100))
INSERT INTO @t SELECT 'Honorificabcdwert'

;with Get_Individual_Chars_Cte AS
( 
   SELECT 
        ID
        ,Row_ID =ROW_NUMBER() Over(PARTITION by ID Order by ID) 
        ,SUBSTRING(String,Number,1) AS [Char]
        ,ASCII(SUBSTRING(String,Number,1)) AS [Ascii Value]

FROM @t  
INNER JOIN master.dbo.spt_values ON
 Number BETWEEN 1 AND LEN(String)
 AND type='P'

)

Select * from Get_Individual_Chars_Cte 

After this I don't know what to do. Help needed for this or any other way of doing so.

Mat
  • 202,337
  • 40
  • 393
  • 406
  • This can be reduced to finding the [longest common substring](http://en.wikipedia.org/wiki/Longest_common_substring_problem) with a string representing the alphabet. I do not know how to translate the pseudocode fro the wiki article to tsql. – Sergey Kalinichenko Sep 22 '12 at 10:27
  • Please give expected output for the following strings `AbCdEfxy` and `abc1234defg` – Martin Smith Sep 22 '12 at 11:10
  • For Input "AbCdEfxy" , the output will be "ABCDEF". For Input "abc1234defg" , output is "DEFG" –  Sep 24 '12 at 04:09

2 Answers2

2

Will this help

DECLARE @t TABLE(ID INT IDENTITY,String VARCHAR(100))
INSERT INTO @t 
SELECT 'Honorificabcdwert' UNION ALL
SELECT 'AbCdEfxy' UNION ALL
SELECT 'abc1234defg' UNION ALL
SELECT 'XYZABCPPCKLMIDBABC' UNION ALL
SELECT 'MNOP$%^&~()MNOPQRS;:'


SELECT ID, OriginalString,Sequence
FROM (SELECT ID, REPLACE(string,'%','') AS Sequence,OriginalString,
      ROW_NUMBER() OVER(PARTITION BY ID ORDER BY LEN(string) DESC, string) AS rn
      FROM (SELECT OriginalString = b.String, CASE WHEN b.String LIKE a.strings THEN a.strings ELSE NULL END AS string, 
            b.ID, ROW_NUMBER() OVER(PARTITION BY ID ORDER BY LEN(strings) DESC, strings) AS rn
            FROM (SELECT COALESCE('%' + b.strings+a.strings + '%','%' + a.strings + '%') AS strings
                  FROM (SELECT SUBSTRING('ABCDEFGHIJKLMNOPQRSTUVWXYZ',t1.N,t2.N-t1.N+1) AS strings, t1.N
                        FROM (VALUES(1),(2),(3),(4),(5),(6),(7),(8),
                                    (9),(10),(11),(12),(13),(14),(15),
                                    (16),(17),(18),(19),(20),(21),(22),
                                    (23),(24),(25),(26)) t1(N)
                        CROSS JOIN (VALUES(1),(2),(3),(4),(5),(6),(7),(8),
                                          (9),(10),(11),(12),(13),(14),(15),
                                          (16),(17),(18),(19),(20),(21),(22),
                                          (23),(24),(25),(26)) t2(N)
                        WHERE t1.N <= t2.N) a
                  LEFT OUTER JOIN (SELECT REVERSE(SUBSTRING('ZYXWVUTSRQPONMLKJIHGFEDCBA',1,N)) AS strings, 1 AS ID
                                   FROM (VALUES(1),(2),(3),(4),(5),(6),(7),(8),
                                               (9),(10),(11),(12),(13),(14),(15),
                                               (16),(17),(18),(19),(20),(21),(22),
                                               (23),(24),(25),(26)) t1(N)
                                   UNION ALL SELECT '', 1) b ON a.N = b.ID) a
            CROSS JOIN @t b) a ) a
WHERE a.rn = 1
ORDER BY a.ID

Result

ID  OriginalString          Sequence
1   Honorificabcdwert       ABCD
2   AbCdEfxy                ABCDEF
3   abc1234defg             DEFG
4   XYZABCPPCKLMIDBABC      XYZABC
5   MNOP$%^&~()MNOPQRS;:    MNOPQRS

Based on your inputs provided in the course of discussion with @Martin Smith, the program is being developed. Please test it and let me know if it satisfies your requirement.

Niladri Biswas
  • 4,153
  • 2
  • 17
  • 24
0

For consecutive rows with characters rising in alphabetical order (equating alphabetical order with ASCII order here) ROW_NUMBER() OVER (ORDER BY Row_ID) - [Ascii Value] will be the same.

This is not sufficient on its own however as for the string ABCZE that would put E in the same group as ABC so then you need a second operation to find gaps in that grouping sequence.

Something like the following should do it.

DECLARE @t TABLE(ID INT IDENTITY,String VARCHAR(100))
INSERT INTO @t SELECT 'Honorificabcdwfrt'

;with Get_Individual_Chars_Cte AS
( 
   SELECT 
        ID
        ,Row_ID =ROW_NUMBER() Over(PARTITION by ID Order by ID) 
        ,SUBSTRING(String,number,1) AS [Char]
        ,ASCII(SUBSTRING(String,number,1)) AS [Ascii Value]

FROM @t  
INNER JOIN master.dbo.spt_values ON
 number BETWEEN 1 AND LEN(String)
 AND type='P'
)
, T1 AS
(
Select *,
       ROW_NUMBER() OVER (ORDER BY Row_ID) - [Ascii Value] AS RN
 from Get_Individual_Chars_Cte 
 ), T2 AS
 (
 SELECT *,
        ROW_NUMBER() OVER (ORDER BY Row_ID) - 
        ROW_NUMBER() OVER (PARTITION BY RN ORDER BY Row_ID) AS Grp
 FROM T1
 )
 SELECT TOP 1 WITH TIES *
 FROM T2
 ORDER BY  COUNT(*) OVER (PARTITION BY RN, Grp) DESC
Martin Smith
  • 438,706
  • 87
  • 741
  • 845
  • Sir, it failed for "XYZABCPPCKLMIDBABC" where the expected output is "XYZABC" –  Sep 22 '12 at 10:54
  • You didn't state in your question that the alphabet wraps round! Traditionally `A` is not considered to come after `Z`. It returns the 4 different groups of 3 in that case. You should be able to amend this by doing a mod 26 operation on the character codes. You also haven't been clear as how non alphabet characters and different cases should be treated. – Martin Smith Sep 22 '12 at 10:55
  • Sir, sorry for that. We should consider only alphabets and if anything else is there (like non-alphabets /special characters e.g. 1,2 @,# etc) then it needs to be ignore –  Sep 22 '12 at 11:02
  • And what about case sensitivity? Is `H` equal to `h`? – Martin Smith Sep 22 '12 at 11:02
  • So how come `Z` comes after `A`? Why does it wrap round to an upper case `A`? where do lower case letters come in the sequence? – Martin Smith Sep 22 '12 at 11:05
  • It is case insensitive (for the time being) so for an input "ABCdeF12ac" , output will be ABCDEF –  Sep 24 '12 at 04:09