1

sort by first matching number then by second matching number in SQL

Suppose, I have a table entries as following.

Btc0504
Btc_0007_Shd_01
Btc_007_Shd_01
Bcd0007_Shd_7
ptc00044
Brg0007_Shd_6
Btc0075_Shd
Bcc43
MR_Tst_etc0565
wtc0004_Shd_4
vtc_Btc0605

so it should bring records as following.

wtc0004_Shd_4
Bcc43
ptc00044
Btc_007_Shd_01
Btc_0007_Shd_01
Brg0007_Shd_6
Bcd0007_Shd_7
Btc0075_Shd
Btc0504
MR_Tst_etc0565
Btc_vtc0605

So basically it sorts by numbers only, words are only separator of numbers.

Here middle strings can be of any numbers.

They are not fixed and this pattern is also not fixed.

so there can be more strings and numbers with row. i.e. a1b2c3d4e5..., u7g2u9w2s8...

So require dynamic solution.

Example table is given below.

http://rextester.com/IDQ22263

Anonymous Creator
  • 2,968
  • 7
  • 31
  • 77
  • use this UDF and sort by this function: https://stackoverflow.com/a/16667431/440030 – Reza ArabQaeni Jun 11 '18 at 06:36
  • 1
    Probably you could write some CLR function to do this with regular expressions. – Cetin Basoz Jun 11 '18 at 06:42
  • @RezaArabQaeni. I tried it. It gives me Btc0075_Shd after Bcd0007_Shd_7. because it removes all alphabets. so it brings 77 before 75. where as my comparison requires 7 and 7 differently. – Anonymous Creator Jun 11 '18 at 06:46
  • @CetinBasoz. Ya I also thought that way. But I need to paginate records based on this (like bring only top 10 records of this sort). so in CLR i will have to bring all records to do this. – Anonymous Creator Jun 11 '18 at 06:47
  • No, you don't. With a CLR UDF you could do the filtering in place. It works like using any built-in function. It also supports context connection. – Cetin Basoz Jun 11 '18 at 06:49
  • @CetinBasoz. Oh my bad (I started thinking about .Net CLR, Wasnt so good at abbreviations.). Ya. will try that way if not any other way. :) – Anonymous Creator Jun 11 '18 at 06:53
  • Yes it IS .Net CLR and it is supported by SQL server. I use it for things where I find it hard or slow to do with MS SQL T-SQL itself. For example, I created a C# function that parses a comma separated list of IDs to a table and then returns matching rows (an IN query). I had the same thing in T-SQL but it was timing out if the IDs were over 10-15K. With CLR UDF, even 20-30K IDs returned data sub second. – Cetin Basoz Jun 11 '18 at 06:58
  • Without using fixed markers such as `Btc`, I don't think it's possible to write a general query in SQL Server. SQL Server does not have any built in regex support, and regex is really what you would need for the general solution you have in mind. – Tim Biegeleisen Jun 11 '18 at 07:03
  • Note: You might try with translate() and string_split() for a solution in T-SQL but I still think it would be best with CLR. – Cetin Basoz Jun 11 '18 at 07:26
  • Am I the only one who understood your question correctly :) Or am the one who didn't understand at all ? Would you please edit your question how would 'Btc_007_Shd_01' or 'Mr0007ec01' sort. – Cetin Basoz Jun 11 '18 at 07:57
  • @CetinBasoz. I think yes. U r the only one (or at least first one). In ur case it will be great if 007 can come first instead of 0007. :) Updated Question. – Anonymous Creator Jun 11 '18 at 07:59
  • @CetinBasoz But we can ignore that zeros as it is not on my priority if any comes first. :) – Anonymous Creator Jun 11 '18 at 08:04
  • Yes that is what I understood too, Btc0007_Shd_01 would be treated as 7 and 1. (I think I will write a CLR UDF at the end:) – Cetin Basoz Jun 11 '18 at 08:07

5 Answers5

2

Assuming you would have 2 number blocks at most and each number would be 10 digits at most, I created a sample CLR UDF like this for you (DbProject - SQL CLR Database project):

using System.Collections.Generic;
using System.Data.SqlTypes;
using System.Text.RegularExpressions;

public partial class UserDefinedFunctions
{
    [Microsoft.SqlServer.Server.SqlFunction]
    public static SqlString CustomStringParser(SqlString str)
    {
        int depth = 2; // 2 numbers at most
        int width = 10; // 10 digits at most

        List<string> numbers = new List<string>();
        var matches = Regex.Matches((string)str, @"\d+");
        foreach (Match match in matches)
        {
            numbers.Add(int.Parse(match.Value).ToString().PadLeft(width, '0'));
        }
        return string.Join("", numbers.ToArray()).PadRight(depth*width);
    }
}

I added this to the 'test' database as follows:

IF EXISTS ( SELECT  *
            FROM    sys.objects
            WHERE   object_id = OBJECT_ID(N'[dbo].[ufn_MyCustomParser]') AND
                    type IN ( N'FN', N'IF', N'TF', N'FS', N'FT' ) )
  DROP FUNCTION [dbo].[ufn_MyCustomParser]
GO
IF EXISTS ( SELECT  *
            FROM    sys.[assemblies] AS [a]
            WHERE   [a].[name] = 'DbProject' AND
                    [a].[is_user_defined] = 1 )
  DROP ASSEMBLY DbProject;
GO


CREATE ASSEMBLY DbProject
FROM 'C:\SQLCLR\DbProject\DbProject\bin\Debug\DbProject.dll'
WITH PERMISSION_SET = SAFE;
GO

CREATE FUNCTION ufn_MyCustomParser ( @csv NVARCHAR(4000))
RETURNS NVARCHAR(4000)
AS EXTERNAL NAME
  DbProject.[UserDefinedFunctions].CustomStringParser;
GO

Note: SQL server 2012 (2017 has strict security problem that you need to handle).

Finally tested with this T-SQL:

declare @MyTable table (col1 varchar(50));
insert into @MyTable values
('Btc0504'),
('Btc0007_Shd_7'),
('Btc0007_Shd_01'),
('Btc0007_Shd_6'),
('MR_Tst_Btc0565'),
('Btc0004_Shd_4'),
('Btc_BwwwQAZtc0605'),
('Btc_Bwwwwe12541edddddtc0605'),
('QARTa1b2');
SELECT * FROM @MyTable
ORDER BY dbo.ufn_MyCustomParser(col1);

Output:

col1
QARTa1b2
Btc0004_Shd_4
Btc0007_Shd_01
Btc0007_Shd_6
Btc0007_Shd_7
Btc0504
MR_Tst_Btc0565
Btc_BwwwQAZtc0605
Btc_Bwwwwe12541edddddtc0605
Cetin Basoz
  • 22,495
  • 3
  • 31
  • 39
1

Below query does the following: it uses patindex function, to extract index in a string of a pattern:

  1. firstly, it extracts beginning of number, searching for a digit.

  2. Secondly, it extracts end of a number searching for digit followed by non-digit.

Having done that, we have everything to extract a nuber from a string and sort by it after converting (casting) it to an integer.

Try this query:

declare @tbl table (col1 varchar(50));
insert into @tbl values
('Btc0504'),
('Btc0007_Shd_7'),
('Btc0007_Shd_6'),
('MR_Tst_Btc0565'),
('Btc0004_Shd_4'),
('Btc_Btc0605');

select col1 from (
    select col1,
           PATINDEX('%[0-9]%', col1) [startIndex],
           case PATINDEX('%[0-9][^0-9]%', col1) when 0 then LEN(col1) else     PATINDEX('%[0-9][^0-9]%', col1) end [endIndex]
    from @tbl
) [a]
order by CAST(SUBSTRING(col1, startIndex, endIndex - startIndex + 1) as int)

I came up with another solution, which is very compact and more general:

;with cte as (
    select 1 [n], col1, STUFF(col1, PATINDEX('%[^0-9]%', col1), 1, '.') refined_col1 from @tbl
    union all
    select n+1, col1, STUFF(refined_col1, PATINDEX('%[^0-9.]%', refined_col1), 1, '.') from cte
    where n < 100 -- <--this number must be greater than the greatest amount of non-digits in a col1, this way, you are sure that you'll remove all unnecesary characters
)

select col1, refined_col1 from cte
where PATINDEX('%[^0-9.]%', refined_col1) = 0
order by CAST(replace(refined_col1, '.', '') as int)
option (maxrecursion 0)
Michał Turczyn
  • 32,028
  • 14
  • 47
  • 69
0

I will begin my answer by saying that the best long term solution for you is to fix your data model. If you have need to use the various portions of the entry in queries, for sorting, etc., then consider storing them in separate bona-fide columns.

That being said, one workaround is to use basic string operations to extract the two components you want so use for sorting. Note carefully that we have to cast them to numbers, because otherwise they won't sort properly as text.

SELECT *
FROM entries
ORDER BY
    CAST(SUBSTRING(entry, PATINDEX('%Btc[0-9]%', entry) + 3, 4) AS INT),
    CASE WHEN CHARINDEX('Shd_', entry) > 0
         THEN
         CAST(SUBSTRING(entry,
                        CHARINDEX('Shd_', entry) + 4,
                        LEN(entry) - CHARINDEX('Shd_', entry) -4) AS INT)
         ELSE 1 END;

enter image description here

Demo

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
0

At the beginning I does not recommend the next approach for performance aspect , you should fix the root cause of your data.

For handling dynamic inputs, I Think you should create UDF function for extracting the numbers only like next:-

CREATE FUNCTION dbo.udf_ExtratcNumbersOnly
(@string VARCHAR(256))
RETURNS int
AS
BEGIN
    WHILE PATINDEX('%[^0-9]%',@string) <> 0
    SET @string = STUFF(@string,PATINDEX('%[^0-9]%',@string),1,'')
    RETURN cast (@string as int)
END
GO

Then use it as next:-

declare @MyTable table (col1 varchar(50));
insert into @MyTable values
('Btc0504'),
('Btc0007_Shd_7'),
('Btc0007_Shd_6'),
('MR_Tst_Btc0565'),
('Btc0004_Shd_4'),
('Btc_BwwwQAZtc0605'),
('Btc_Bwwwwe12541edddddtc0605'),
('QARTa1b2c3d4e5');

select * from @MyTable 
order by (dbo.udf_ExtratcNumbersOnly(col1))

Result:-

Btc0004_Shd_4
Btc0007_Shd_6
Btc0007_Shd_7
Btc0504
MR_Tst_Btc0565
Btc_BwwwQAZtc0605
QARTa1b2c3d4e5
Btc_Bwwwwe12541edddddtc0605

Demo.

ahmed abdelqader
  • 3,409
  • 17
  • 36
0

You can use a tally table/numbers table to get each character and find only numbers and then combine the numbers in order to form a string(which can be casted into bigint). Then you can order based on this string.

See working demo

; with numbers as (
    select top 10000
        r= row_number() over( order by (select null))
    from sys.objects o1 
        cross join sys.objects o2
   )

, onlynumbers as
(
    select * from t 
    cross apply
    ( select part =substring(num,r,1),r
      from numbers where r<=len(num)
     )y
    where part  like '[0-9]' 
)

, finalorder as
(
    select num,cast(replace(stuff
    ((
        select ','+part
        from onlynumbers o2 
        where o2.num=o1.num
        order by o2.r
        for xml path('')
        ),1,1,''),',','') as bigint) b
  from onlynumbers o1
  group by num
   )
 select num from finalorder order by b asc
DhruvJoshi
  • 17,041
  • 6
  • 41
  • 60
  • Like other answers, this too is ignoring the fact that OPs pattern might have 2 blocks of numbers and in that case the sorting should be done first by first block and then the second. For example, in your sample try including 'Btc0007_Shd_01' which should be returned as the 1st row. As I said, it is best not to try to solve this by T-SQL but CLR. – Cetin Basoz Jun 11 '18 at 07:06
  • @CetinBasoz I think you would need regex for a reliable arbitrary solution. Since the OP is using SQL Server, this is not possible. – Tim Biegeleisen Jun 11 '18 at 07:11
  • @TimBiegeleisen, "impossible" ???? A strong word even for SQL server. As I already said a few times, SQL server supports CLR and you could do Regex matching (not a must in this case) with CLR. – Cetin Basoz Jun 11 '18 at 07:14