1

Note: I believe this is no duplicate of this seemingly related question, since I’ve got actual strings that contain a number instead of integers stored as strings like the other OP.

I’m looking for a way to output SQL strings ordered in natural order by a number they contain, like a numerical infix. For example, Windows Explorer does that naturally:

screenshot from Windows Explorer

On DBA, I read that numbers alone have no properties for alphabetical sorting, but I’m sure there’s some way of doing that for infix numbers without leading zeros, because they are significant.

In SQL Server 2008 R2, my scenario is somewhat like this:

DECLARE @Table TABLE (ID INT, Name VARCHAR(10));
INSERT INTO @Table (ID, Name) VALUES
(1,'name1file'),   (5,'name11file'),
(2,'name2file'),   (6,'name20file'),
(3,'name3file'),   (7,'name21file'),
(4,'name10file');
SELECT * FROM @Table ORDER BY Name COLLATE Latin1_General_100_BIN ASC;

Exemplary result sets:

 Output got         Output desired
 ID | Name          ID | Name
 ==========         ==========
 1  | name1file     1  | name1file
 4  | name10file    2  | name2file
 5  | name11file    3  | name3file
 2  | name2file     4  | name10file
 6  | name20file    5  | name11file
 7  | name21file    6  | name20file
 3  | name3file     7  | name21file

The ID column is for illustrative purposed only (you could just ORDER BY ID in that case). It’s just the VARCHAR column I query. I tried using COLLATE, but none of the fn_helpcollations I tried produced the desired order.

So, is there a way to achieve that without using SUBSTRING or PATINDEX or CLR functions or other means of extracting the number, maybe by using an appropriate collation?

If not, is there a reason this common use case is not implemented in any of the thousands of collations? I hoped that Windows Explorer used some common collation for that instead of a function implementation like PHP’s natsort().

Community
  • 1
  • 1
dakab
  • 5,379
  • 9
  • 43
  • 67
  • can you please give some original file names? is the prefix and suffix of the number constant, like you have given in the example: 'name' and 'file'. – Deep Kalra Aug 21 '15 at 08:22
  • 2
    Do filenames always have name#file format? If so try `ORDER BY LEN(Name), Name` – Lukasz Szozda Aug 21 '15 at 08:26
  • The same question on PostgreSQL - http://stackoverflow.com/questions/12965463/humanized-or-natural-number-sorting-of-mixed-word-and-number-strings Neither of the solutions will work on MS SQL without using CLR, though. And it's not going to be fast in any case... – Luaan Aug 21 '15 at 08:29
  • Yes it is constant, the string format does not change except for the numbering. @lad2025: That works! If you could explain _why_, this might qualify as an answer rather than a comment. – dakab Aug 21 '15 at 08:38

1 Answers1

2

If your files have the name in format name###file you can sort it using

SELECT * FROM @Table ORDER BY LEN(Name), Name

This sorting is simple, first sort by length of Name then by Name. Your file name is constant and only number part changed, so "5", "1" and "2" are "before "10" based on length. Second ordering gives correct order between number in the same magnitude (0-9) (10-99) (100-999) and so on.

Keep in mind that it is not perfect general solution for example: "z" < "aa".

Lukasz Szozda
  • 162,964
  • 23
  • 234
  • 275
  • The strings do maintain a format like this, only the numeric portions differs. Your solution works! Your answer could be further improved by adding a short explanation _how_ `LEN` is doing the magic. – dakab Aug 21 '15 at 08:40
  • That’s a really clever, yet amazingly simple heuristic, thank you for it. It will thoroughly work character-wise in the respective order (that’s natural), which is quite fine. And isn’t in fact `“z” < “aa” (< “az” < “aaa” …)`? – dakab Aug 22 '15 at 16:50
  • Yes "z" < "aa" < "az" < "aaa" < "ccc" < "aaaa" and so on. Length take precedense. – Lukasz Szozda Aug 22 '15 at 16:57
  • Just think on another solution and you can also try something like this: `ORDER BY REPLACE(Name, '[non-digit-regex]', '')`, so simply replace all non number character to '' (empty string), but it definitely be slower. – Lukasz Szozda Aug 22 '15 at 17:00
  • For development purposes, performance is acceptable. And by replacing constants one could expand this solution to other scenarios. Precedence by length means by `0-9a-z`, and that’s natural order. Isn’t that great of an answer? – dakab Aug 22 '15 at 17:08
  • One more thing, replacing non digit will end with 0, 1, 10, and so on, so you need also PAD result something like `ORDER BY (RIGHT(REPLACE(Name, '[non-digit-regex]', '') + REPLICATE('0',5),5))`;. Better stay with original answer :) – Lukasz Szozda Aug 22 '15 at 17:12