2

Imagine i have a data set that contains:

Date            Id
--------------  ----
11/1/2017       null
11/4/2017       3
11/5/2017       null
11/12/2017      10
null            1
null            2
null            7
null            8
null            9

I want the rows ordered so that both columns are increasing.

Using a naïve ORDER BY Date, ID does not do it:

enter image description here

There is an ordering

There is an ordering that satisfies the results of my desired sort order:

  • the date column is always increasing
  • the id column value is always increasing

 

enter image description here

Or course, that's not a unique ordering:

Date            Id
--------------  ---------------
null            1
11/1/2017       null
null            2
11/4/2017       3
null            7
null            8
null            9
11/5/2017       null
11/12/2017      10

A programming language could do it

I know i can accomplish this on the client side. In a functional functional programming language: use a stable sorting algorithm:

A stable sort is one which preserves the original order of the input set, where the comparison algorithm does not distinguish between two or more items.

Consider a sorting algorithm that sorts cards by rank, but not by suit. The stable sort will guarantee that the original order of cards having the same rank is preserved; the unstable sort will not.

enter image description here

Unfortunately i have

  • 9.1 million rows
  • 1.8 GB

of monotonically increasing rows to put in best possible chronological sort order. Obviously i'd prefer to do this on the server - which is well suited to handling large amounts of data.

How can i perform a stable-sort in SQL Server?


Example Data

 

CREATE TABLE #SortDemo (Date datetime NULL, Id int NULL)

INSERT INTO #SortDemo (Date, Id)
VALUES 
    ('20171101', null),
    ('20171104',    3),
    ('20171105', null),
    ('20171112',   10),
    (null,          1),
    (null,          2),
    (null,          7),
    (null,          8),
    (null,          9)


SELECT * FROM #SortDemo
ORDER BY Date, Id
Ian Boyd
  • 246,734
  • 253
  • 869
  • 1,219
  • 1
    My understanding of databases is in line with [this answer](https://stackoverflow.com/questions/18613055/is-order-by-and-row-number-deterministic#), meaning table is a bag collection with no natural order to preserve. – orhtej2 Dec 01 '17 at 22:09
  • 2
    Rows in a table have no inherent or reliable order. Your assumption is flawed and cannot be implemented without something in each row that identifies the original order. Do not be fooled by small sets of static data. – SMor Dec 02 '17 at 10:44
  • yep the "original order of the input set" isn't something that databases care about or preserve. without that you just get the order in whatever the execution plan operators produce with no guarantee of repeatibility. You need to have a column with the desired order if you care about this. – Martin Smith Dec 02 '17 at 20:31

1 Answers1

1

It is a solvable problem. You don't have to throw your hands up and say computers cannot be used to solve problems.

Start with the data, and add a new surrogate "Rank" column.

Date            Id    Rank
--------------  ----  ----
null            7     null
null            1     null
null            9     null
null            2     null  
null            8     null
11/1/2017       null  null
11/4/2017       3     null
11/5/2017       null  null
11/12/2017      10    null
11/13/2017      null  null

Then seed Rank to the Id:

UPDATE SortDemo SET Rank = Id
WHERE Id IS NOT NULL

Date            Id    Rank
--------------  ----  ----
null            7     7
null            1     1 
null            9     9 
null            2     2 
null            8     8 
11/1/2017       null  null
11/4/2017       3     3
11/5/2017       null  null 
11/12/2017      10    10
11/13/2017      null  null

Then for the remaining items items with a Date, we need to assign it to the "next" rank:

Date            Id    Rank
--------------  ----  ----
null            7     7
null            1     1 
null            9     9 
null            2     2 
null            8     8 
11/1/2017       null  null  <-- 3
11/4/2017       3     3     
11/5/2017       null  null  <-- 10
11/12/2017      10    10    
11/13/2017      null  null  <-- ?

with

UPDATE SortDemo SET Rank = (
      SELECT MIN(Rank) FROM SortDemo s2 
      WHERE s2.Date >= SortDemo.Date)
WHERE SortDemo.Date IS NOT NULL

Date            Id    Rank
--------------  ----  ----
null            7     7
null            1     1 
null            9     9 
null            2     2 
null            8     8 
11/1/2017       null  3    <--
11/4/2017       3     3     
11/5/2017       null  10   <--
11/12/2017      10    10    
11/13/2017      null  null <-- ?

There's also the edge case where there are no items "after" us; which we can fix by going backwards to one the highest previous rank:

UPDATE SortDemo SET Rank = (
      SELECT MAX(Rank) FROM SortDemo s2 
      WHERE s2.Date <= SortDemo.Date)
WHERE SortDemo.Date IS NOT NULL

Date            Id    Rank
--------------  ----  ----
null            7     7
null            1     1 
null            9     9 
null            2     2 
null            8     8 
11/1/2017       null  3
11/4/2017       3     3     
11/5/2017       null  10
11/12/2017      10    10    
11/13/2017      null  10 <--10

And we're done

We can now sort by surrogate Rank, and break ties by sorting by Date:

SELECT * FROM SortDemo
ORDER BY Rank, Date

Date            Id    Rank
--------------  ----  ----
null            1     1 
null            2     2 
11/1/2017       null  3
11/4/2017       3     3     
null            7     7
null            8     8 
null            9     9 
11/5/2017       null  10
11/12/2017      10    10    
11/13/2017      null  10

Solution complete. Sql Fiddle

The answer is in escrow until Monday, so i can give people the opportunity to earn reputation for solving a unique problem.

Ian Boyd
  • 246,734
  • 253
  • 869
  • 1,219