6

I am trying to pregenerate some alphanumeric strings and insert the result into a table. The length of string will be 5. Example: a5r67. Basically I want to generate some readable strings for customers so they can access their orders like www.example.com/order/a5r67. Now I have a select statement:

;WITH 
    cte1 AS(SELECT * FROM (VALUES('0'),('1'),('2'),('3'),('4'),('5'),('6'),('7'),('8'),('9'),('a'),('b'),('c'),('d'),('e'),('f'),('g'),('h'),('i'),('j'),('k'),('l'),('m'),('n'),('o'),('p'),('q'),('r'),('s'),('t'),('u'),('v'),('w'),('x'),('y'),('z')) AS v(t)),
    cte2 AS(SELECT * FROM (VALUES('0'),('1'),('2'),('3'),('4'),('5'),('6'),('7'),('8'),('9'),('a'),('b'),('c'),('d'),('e'),('f'),('g'),('h'),('i'),('j'),('k'),('l'),('m'),('n'),('o'),('p'),('q'),('r'),('s'),('t'),('u'),('v'),('w'),('x'),('y'),('z')) AS v(t)),
    cte3 AS(SELECT * FROM (VALUES('0'),('1'),('2'),('3'),('4'),('5'),('6'),('7'),('8'),('9'),('a'),('b'),('c'),('d'),('e'),('f'),('g'),('h'),('i'),('j'),('k'),('l'),('m'),('n'),('o'),('p'),('q'),('r'),('s'),('t'),('u'),('v'),('w'),('x'),('y'),('z')) AS v(t)),
    cte4 AS(SELECT * FROM (VALUES('0'),('1'),('2'),('3'),('4'),('5'),('6'),('7'),('8'),('9'),('a'),('b'),('c'),('d'),('e'),('f'),('g'),('h'),('i'),('j'),('k'),('l'),('m'),('n'),('o'),('p'),('q'),('r'),('s'),('t'),('u'),('v'),('w'),('x'),('y'),('z')) AS v(t)),
    cte5 AS(SELECT * FROM (VALUES('0'),('1'),('2'),('3'),('4'),('5'),('6'),('7'),('8'),('9'),('a'),('b'),('c'),('d'),('e'),('f'),('g'),('h'),('i'),('j'),('k'),('l'),('m'),('n'),('o'),('p'),('q'),('r'),('s'),('t'),('u'),('v'),('w'),('x'),('y'),('z')) AS v(t))
INSERT INTO ProductHandles(ID, Used)
SELECT cte1.t + cte2.t + cte3.t + cte4.t + cte5.t, 0
FROM cte1
CROSS JOIN cte2
CROSS JOIN cte3
CROSS JOIN cte4
CROSS JOIN cte5

Now the problem is I need to write something like this to get a value from the table:

SELECT TOP 1 ID 
FROM ProductHandles
WHERE Used = 0

I will have index on the Used column so it will be fast. The problem with this is that it comes with order:

00000
00001
00002
...

I know that I can order by NEWID(), but that will be much slower. I know that there is no guarantee of ordering unless we specify Order By clause. What is needed is opposite. I need guaranteed chaos, but not by ordering by NEWID() each time customer creates order.

I am going to use it like:

WITH cte as (
                SELECT TOP 1 * FROM ProductHandles WHERE Used = 0
                --I don't want to order by newid() here as it will be slow
            )
UPDATE cte 
SET Used = 1
OUTPUT INSERTED.ID
Zohar Peled
  • 79,642
  • 10
  • 69
  • 121
Giorgi Nakeuri
  • 35,155
  • 8
  • 47
  • 75
  • if you want chaos why do you care which ID they get, as long as it's not used? It looks like you are trying to create a simplified GUID – S3S Nov 01 '18 at 13:32
  • It is a requirement. It is not my willing. Yes something like a `readable guid`. – Giorgi Nakeuri Nov 01 '18 at 13:33
  • Well, you could randomly order by a substring of the id... not sure it would perform well but it'd be random `DECLARE @sort int = (SELECT ROUND(((6 - 1 -1) * RAND() + 1), 0)) ORDER BY substring(ID,@sort,5)` – S3S Nov 01 '18 at 13:41
  • @scsimon, no there will be millions of rows. It will have first to select all and then order. It will be slow. – Giorgi Nakeuri Nov 01 '18 at 13:43
  • Add a `seq` to the `ProductHandles` and when inserting to it order by `NEWID()`. in the `cte` `TOP 1 ORDER BY seq`. The `slowness` is only when you create the `ProductHandles` – Squirrel Nov 01 '18 at 13:43
  • 1
    is this to satisfy a "do not issue sequential/easily guessed identifiers for orders" type of requirement? – Damien_The_Unbeliever Nov 01 '18 at 13:44
  • 3
    I don't think there is a way to pull a random row out of a table without using that order by. Maybe instead you could make the insert itself more random and get the top value ordered by ID? – Sean Lange Nov 01 '18 at 13:44

3 Answers3

8

If you add an identity column to the table, and use order by newid() when inserting the records (that will be slow but it's a one time thing that's being done offline from what I understand) then you can use order by on the identity column to select the records in the order they where inserted to the table.

From the Limitations and Restrictions part of the INSERT page in Microsoft Docs:

INSERT queries that use SELECT with ORDER BY to populate rows guarantees how identity values are computed but not the order in which the rows are inserted.

This means that by doing this you are effectively making the identity column ordered by the same random order the rows where selected in the insert...select statement.

Also, there is no need to repeat the same cte 5 times - you are already repeating the cross apply:

CREATE TABLE ProductHandles(sort int identity(1,1), ID char(5), used bit)


;WITH 
    cte AS(SELECT * FROM (VALUES('0'),('1'),('2'),('3'),('4'),('5'),('6'),('7'),('8'),('9'),('a'),('b'),('c'),('d'),('e'),('f'),('g'),('h'),('i'),('j'),('k'),('l'),('m'),('n'),('o'),('p'),('q'),('r'),('s'),('t'),('u'),('v'),('w'),('x'),('y'),('z')) AS v(t))        
INSERT INTO ProductHandles(ID, Used)
SELECT a.t + b.t + c.t + d.t + e.t, 0
FROM cte a
CROSS JOIN cte b
CROSS JOIN cte c
CROSS JOIN cte d
CROSS JOIN cte e
ORDER BY NEWID()

Then the cte can have an order by clause that guarantees the same random order as the rows returned from the select statement populating this table:

WITH cte as (
                SELECT TOP 1 * 
                FROM ProductHandles 
                WHERE Used = 0
                ORDER BY sort 
            )
UPDATE cte 
SET Used = 1
OUTPUT INSERTED.ID

You can see a live demo on rextester. (with only digits since it's taking too long otherwise)

Zohar Peled
  • 79,642
  • 10
  • 69
  • 121
  • 1
    @ZoharPeled . . . I missed that. You *are* using an identity column. Kudos. – Gordon Linoff Nov 01 '18 at 16:27
  • Btw, there is a better (well, at least from the performance point of view) way to populate a random strings pool - Read my blog post for details: https://zoharpeled.wordpress.com/2019/09/15/pre-populate-a-random-strings-pool/ – Zohar Peled Oct 31 '19 at 09:33
2

Here's a slightly different option... Rather than trying to generate all possible values in a single sitting, you could simply generate a million or two at a time and generate more as they get used up. Using this approach, you drastically reduce the the initial creation time and eliminate the need to maintain the massive table of values, the majority of which, that will never be used.

CREATE TABLE dbo.ProductHandles (
    rid INT NOT NULL
        CONSTRAINT pk_ProductHandles 
        PRIMARY KEY CLUSTERED,
    ID_Value CHAR(5) NOT NULL
        CONSTRAINT uq_ProductHandles_IDValue 
        UNIQUE WITH (IGNORE_DUP_KEY = ON),      -- prevents the insertion of duplicate values w/o generating any errors.
    Used BIT NOT NULL
        CONSTRAINT df_ProductHandles_Used 
        DEFAULT (0)
    );

-- Create a filtered index to help facilitate fast searches
-- of unused values.
CREATE NONCLUSTERED INDEX ixf_ProductHandles_Used_rid    
    ON dbo.ProductHandles (Used, rid)
    INCLUDE(ID_Value)
WHERE Used = 0;

--==========================================================

WHILE 1 = 1     -- The while loop will attempt to insert new rows, in 1M blocks, until required minimum of unused values are available.
BEGIN 
    IF (SELECT COUNT(*) FROM dbo.ProductHandles ph WHERE ph.Used = 0) > 1000000     -- the minimum num of unused ID's you want to keep on hand.
    BEGIN
        BREAK;
    END;
    ELSE 
    BEGIN
        WITH 
            cte_n1 (n) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n (n)), 
            cte_n2 (n) AS (SELECT 1 FROM cte_n1 a CROSS JOIN cte_n1 b),
            cte_n3 (n) AS (SELECT 1 FROM cte_n2 a CROSS JOIN cte_n2 b),
            cte_Tally (n) AS (
                SELECT TOP (1000000)    -- Sets the "block size" of each insert attempt.
                    ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
                FROM
                    cte_n3 a CROSS JOIN cte_n3 b
                )
        INSERT dbo.ProductHandles (rid, ID_Value, Used)
        SELECT 
            t.n + ISNULL((SELECT MAX(ph.rid) FROM dbo.ProductHandles ph), 0),
            CONCAT(ISNULL(c1.char_1, n1.num_1), ISNULL(c2.char_2, n2.num_2), ISNULL(c3.char_3, n3.num_3), ISNULL(c4.char_4, n4.num_4), ISNULL(c5.char_5, n5.num_5)),
            0
        FROM
            cte_Tally t
            -- for each of the 5 positions, randomly generate numbers between 0 & 36. 
            -- 0-9 are left as numbers. 
            -- 10 - 36 are converted to lower cased letters.
            CROSS APPLY ( VALUES (ABS(CHECKSUM(NEWID())) % 36) ) n1 (num_1)
            CROSS APPLY ( VALUES (CHAR(CASE WHEN n1.num_1 > 9 THEN n1.num_1 + 87 END)) ) c1 (char_1)
            CROSS APPLY ( VALUES (ABS(CHECKSUM(NEWID())) % 36) ) n2 (num_2)
            CROSS APPLY ( VALUES (CHAR(CASE WHEN n2.num_2 > 9 THEN n2.num_2 + 87 END)) ) c2 (char_2)
            CROSS APPLY ( VALUES (ABS(CHECKSUM(NEWID())) % 36) ) n3 (num_3)
            CROSS APPLY ( VALUES (CHAR(CASE WHEN n3.num_3 > 9 THEN n3.num_3 + 87 END)) ) c3 (char_3)
            CROSS APPLY ( VALUES (ABS(CHECKSUM(NEWID())) % 36) ) n4 (num_4)
            CROSS APPLY ( VALUES (CHAR(CASE WHEN n4.num_4 > 9 THEN n4.num_4 + 87 END)) ) c4 (char_4)
            CROSS APPLY ( VALUES (ABS(CHECKSUM(NEWID())) % 36) ) n5 (num_5)
            CROSS APPLY ( VALUES (CHAR(CASE WHEN n5.num_5 > 9 THEN n5.num_5 + 87 END)) ) c5 (char_5);
    END;
END;

After the initial creation, move the code in the WHILE loop to a stored procedure and schedule it to automatically run on a periodic basis.

Jason A. Long
  • 4,382
  • 1
  • 12
  • 17
  • Jason, are you sure this will not produce dups? – Giorgi Nakeuri Nov 02 '18 at 05:10
  • Dupes will be produced but the UNIQUE constraint on the table will reject them. If you run the code, you see a message that looks like the following: "Duplicate key was ignored. (991691 rows affected) Duplicate key was ignored. (975541 rows affected) " – Jason A. Long Nov 02 '18 at 12:37
0

If I'm understanding this right, It looks like your attempting to separate the URL/visible data from the DB record ID, as most apps use, and provide something that is not directly related to an ID field that the user will see. NEWID() does allow control of the number of characters so you could generate a smaller field with a smaller index. Or just use a portion of the full NEWID()

SELECT CONVERT(varchar(255), NEWID())
SELECT SUBSTRING(CONVERT(varchar(40), NEWID()),0,5)

You might also want to look at a checksum field, I don't know if its faster on indexing though. You could get crazier by combining random NEWID() with a checksum across 2 or 3 fields.

SELECT BINARY_CHECKSUM(5 ,'EP30461105',1)
DBTales
  • 91
  • 8