0

How to generate 6 digit unique alphanumeric string with 6- character length, case nonsensitive for 4 million records. By replacing 1’s, I’s, O’s, and 0’s.

I have tried using the below query but the problem is when I am trying to replace the above values the unique id has some duplicate values.

** select CAST(REPLACE(REPLACE(CHAR( ASCII('AA')+(ABS(CHECKSUM(NEWID()))%25)) , 'O', ''), 'I', '')

  • REPLACE(REPLACE(REPLACE( REPLACE(SUBSTRING(CONVERT(varchar(60), NEWID()),1, 10) , '-',''), '.' , ''), '0' , ''), '1','') AS nvarchar (6)) , employee_id from cte **

The final output should be something like:

UID      em_id
AB1267    123
A34BF7    456 .... .... .... ... .... .... .... ... .... .... ...

H K
  • 11
  • 1
  • 3

5 Answers5

0

How to generate 6 digit unique alphanumeric string with 6- character length, case nonsensitive for 4 million records.

There are enough hex digits to do what you want. So, one option is:

select right('ZZZZZZ' + format(row_number() over (order by newid()), 'X'), 6)

This generates a sequential number (randomly), converts it to hex, and then prepends Zs.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • This does generate 0`s and 1`s – H K Sep 22 '20 at 00:00
  • @HarishK . . . This answers the question as I understand it. To be honest, I have no idea what the second sentence is supposed to mean. – Gordon Linoff Sep 22 '20 at 00:03
  • @GordonLinoff I believe the OP wants unique [youtube-like ids](https://stackoverflow.com/q/3990820/11683), with the restriction that they should not contain characters `1`, `0`, `I` and `O`. – GSerg Sep 22 '20 at 18:02
0

If you want the UIDs to appear to be random (e.g., 1st could be G5K2M5, second 23BN32, etc), I think you basically have three choices

  • (In a loop) randomly generating UIDs, remove those that a) already exist, and b) have duplicates in your generated list, then insert the unique UIDs. Repeat until you have none left.
  • Generate a table with all possible UIDs (e.g., all letters and numbers except 1, I, L, 0, o - note I've added L to the list as lowercase l looks like I or 1). That means 31 possible characters in 6 slots... 31^6 is approximately 900 million possibilities. For the UIDs to use, randomly select the number needed from the UID list, assign them as needed, then remove them from the list so you won't get doubles.
  • Use a formula where each number is uniquely mapped to a UID. Then just get the rownumber or other unique int identifier, and calculate the UID from it. Note that the formula could be a mathematical formula, or could just be a table (as above) where the UIDs are initially randomly sorted, and you just take the UID from the relevant rownumber.
seanb
  • 6,272
  • 2
  • 4
  • 22
0
select top (100000)
    cte.*, 
    concat
    (   
        substring(s.random32, p.p1, 1),
        substring(s.random32, p.p2, 1),
        substring(s.random32, p.p3, 1),
        substring(s.random32, p.p4, 1),
        substring(s.random32, p.p5, 1),
        substring(s.random32, p.p6, 1)
    ) as combo6

from
--employees
(
    --4mil employees
    select top (4000000) 
    row_number() over(order by @@spid) as empid, --this could be empid, eg. empid as n
    a.name as empfirstname, a.name as emplastname, b.type_desc as emptype
    from sys.all_objects as a
    cross join sys.all_objects as b
) as cte
--random string
cross join
(
    --one random string  (excluding 1, 0, I, O) 
    select top (1)
    (
        select v.v as '*'
        from 
        (values 
                  ('2'),('3'),('4'),('5'),('6'),('7'),('8'),('9'),
            ('A'),('B'),('C'),('D'),('E'),('F'),('G'),('H'),
            ('J'),('K'),('L'),('M'),('N'),      ('P'),('Q'),('R'),
            ('S'),('T'),('U'),('V'),('W'),('X'),('Y'),('Z')
        ) as v(v)
        order by newid()
        for xml path('')
    ) as random32
) as s
--combo6 positions in string
cross apply
(
    select
        /*for 32 chars = len(rand32) */
        (power(32,0)+(cte.empid-1)%power(32, 1))/power(32,0) as p1,
        (power(32,1)+(cte.empid-1)%power(32, 2))/power(32,1) as p2,
        (power(32,2)+(cte.empid-1)%power(32, 3))/power(32,2) as p3,
        (power(32,3)+(cte.empid-1)%power(32, 4))/power(32,3) as p4,
        (power(32,4)+(cte.empid-1)%power(32, 5))/power(32,4) as p5,
        (power(32,5)+(cte.empid-1)%power(32, 6))/power(32,5) as p6
) as p
go

....or....(?)

create or alter function dbo.[why?]()
returns char(6)
as
begin
    declare @combo6 char(6);
    declare @randomstring char(32) = cast(session_context(N'randomstring') as char(32));

    if @randomstring is null
    begin
        select @randomstring =
        (
            select v.v as '*'
            from 
            (values 
                      ('2'),('3'),('4'),('5'),('6'),('7'),('8'),('9'),
                ('A'),('B'),('C'),('D'),('E'),('F'),('G'),('H'),
                ('J'),('K'),('L'),('M'),('N'),      ('P'),('Q'),('R'),
                ('S'),('T'),('U'),('V'),('W'),('X'),('Y'),('Z')
            ) as v(v)
            order by checksum(@@idle, @@cpu_busy, (select max(last_request_end_time) from sys.dm_exec_sessions where session_id=@@spid), v.v)
            for xml path('')
        );
    end

    declare @randomnumber int = 1 + isnull(cast(session_context(N'randomnumber') as int), abs(checksum(@randomstring))%10000000);


    select @combo6 = concat(    
                            substring(@randomstring, p.p1, 1),
                            substring(@randomstring, p.p2, 1),
                            substring(@randomstring, p.p3, 1),
                            substring(@randomstring, p.p4, 1),
                            substring(@randomstring, p.p5, 1),
                            substring(@randomstring, p.p6, 1)
                            )
    from
    (
        select
            /*for 32 chars = len(rand32) */
            (power(32,0)+(@randomnumber-1)%power(32, 1))/power(32,0) as p1,
            (power(32,1)+(@randomnumber-1)%power(32, 2))/power(32,1) as p2,
            (power(32,2)+(@randomnumber-1)%power(32, 3))/power(32,2) as p3,
            (power(32,3)+(@randomnumber-1)%power(32, 4))/power(32,3) as p4,
            (power(32,4)+(@randomnumber-1)%power(32, 5))/power(32,4) as p5,
            (power(32,5)+(@randomnumber-1)%power(32, 6))/power(32,5) as p6
    ) as p;

    
    exec sp_set_session_context @key=N'randomstring', @value=@randomstring;
    exec sp_set_session_context @key=N'randomnumber', @value=@randomnumber;

    return(@combo6);
end
go


exec sp_set_session_context @key=N'randomstring', @value=null;
exec sp_set_session_context @key=N'randomnumber', @value=null;
go

select top (100000) dbo.[why?]() as empid, a.name, b.object_id
from sys.all_objects as a
cross join sys.all_objects as b
go

--drop function dbo.[why?]
lptr
  • 1
  • 2
  • 6
  • 16
0

The difficulty reside on the UNIQUE feature of the string. Some solutions that have been shown cannot guarantee the uniqueness of the generated strings. First solution of lptr does not give always 6 letters and give some duplicates.

My solution give the full requirement, but it is slow :

WITH TAZ AS
(SELECT CAST('A' COLLATE Latin1_General_BIN AS CHAR(1)) AS LETTER, ASCII('A') AS CAR
 UNION ALL
 SELECT CAST(CHAR(CAR + 1)  COLLATE Latin1_General_BIN  AS CHAR(1)), CAR + 1
 FROM   TAZ
 WHERE  CHAR(CAR + 1) <= 'Z'
)
SELECT TOP 4000000 
       T1.LETTER + T2.LETTER + T3.LETTER + T4.LETTER + T5.LETTER + T6.LETTER AS L6
FROM TAZ AS T1
     CROSS JOIN TAZ AS T2
     CROSS JOIN TAZ AS T3
     CROSS JOIN TAZ AS T4
     CROSS JOIN TAZ AS T5
     CROSS JOIN TAZ AS T6
ORDER BY NEWID()

One thing I do in such a case is to compute all the 6 length strings possible and store it in a plain table stored in a compressed mode and in a read only storage (a tally table). The table has an ID and an extra column of bit type with the 0 value.

When you want to attribute some 6 chars string values, you just pickup from the table and marks it with the bit modify to 1.

As an information, this is the way that referenced ticket file are givent to customer in the french national railway compagny call SNCF since a long time.

SQLpro
  • 3,994
  • 1
  • 6
  • 14
  • It's the `order by newid()` that's making it slow. from all the solutions I've tested, including Gordon's and myself, without the `order by` yours is the fastest (well, a slightly modified version of it, anyway). – Zohar Peled Sep 23 '20 at 14:03
0

First, let me start by saying you already have some fine answers here, the only problem is with them is that they are slow. My suggested solution is fast - even very fast in comparison.

A year ago I've written a blog post entitled How to pre-populate a random strings pool that was based on an answer written by Martin Smith to How can I generate random strings in TSQL. I've basically took the code posted in that answer and wrapped it up inside a inline table valued function.

For this problem, I've taken that function and modified it ever so slightly to better fit your requirements - mainly the number of random strings (original version can produce up to 1,000,000 rows only) and the case-insensitivity.

Tests I've made comparing the speed of execution between Gordon's SQLpro's, lptr's answers and my own showed conclusively that this is the best solution between all four, at least in terms of execution speed.

So, without further ado, here's the code:

First, the function and it's auxiliary view:

-- This view is needed for the function to work. Read my blog post for details.
CREATE VIEW dbo.GuidGenerator
AS
    SELECT Newid() As NewGuid;
GO

-- slightly modified version to enable the generation of up to 100,000,000 rows.
CREATE FUNCTION dbo.RandomStringGenerator
(
    @Length int,
    @Count int -- Note: up to 100,000,000 rows
)
RETURNS TABLE
AS 
RETURN 

WITH E1(N) AS (SELECT N FROM (VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9)) V(N)),   -- 10
     E2(N) AS (SELECT 1 FROM E1 a, E1 b), --100
     E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10,000
     Tally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY @@SPID) FROM E4 a, E4 b) -- 100,000,000

SELECT TOP(@Count) 
    N As Number,
    (
    SELECT  TOP (@Length) CHAR(
            CASE Abs(Checksum(NewGuid)) % 2
                WHEN 0 THEN 65 + Abs(Checksum(NewGuid)) % 26 -- Random upper case letter
                ELSE 48 + Abs(Checksum(NewGuid)) % 10 -- Random digit
            END
            )
    FROM Tally As t0  
    CROSS JOIN GuidGenerator 
    WHERE  t0.n <> -t1.n
    FOR XML PATH('')
    ) As RandomString
FROM Tally As t1

GO

Then, using distinct, top 4000000 and a simple where clause - select the random strings you want:

SELECT DISTINCT TOP 4000000 Number, RandomString
FROM  dbo.RandomStringGenerator(6,100000000)
WHERE RandomString NOT LIKE '%[IiOoLl01]%' -- in case your database's default collation is case sensitive...

The reason this is the fastest solution is very simple - My solution already generates the strings randomly, so I don't need to also sort them randomly - which is the biggest bottle neck of the other suggested solutions.
If you don't need the order to be random, you can go with SQLpro's solution, just remove the order by newid() - that was the fastest solution (though it didn't filter out the unwanted chars)

Update
As requested by lptr - here's an example on how to select the random strings and another table as well:

WITH Tbl AS
(
    SELECT *, ROW_NUMBER() OVER (ORDER BY Em_Id) As Rn
    FROM <TableNameHere>
), Rnd AS
(
    SELECT DISTINCT TOP 4000000 ROW_NUMBER() OVER (ORDER BY Number) As Rn, RandomString
    FROM  dbo.RandomStringGenerator(6,100000000)
    WHERE RandomString NOT LIKE '%[IiOoLl01]%'
)

SELECT Em_Id, RandomString
FROM Tbl
INNER JOIN rnd
    ON Tbl.Rn = Rnd.Rn

Notes:

  1. Change <TableNameHere> to the actual table name
  2. You can use any column (or constant) for the order by of the row number, it doesn't matter because the order is irrelevant here anyway.
Zohar Peled
  • 79,642
  • 10
  • 69
  • 121
  • the requirement is to generate random unique values FOR 4 million rows. This is not exactly the same as generating 4million unique values (in a vacuum). How long does it take to select a couple of columns + unique string of your function from a 4 million table? – lptr Sep 23 '20 at 17:46
  • @Iptr I don't know, didn't try that - but neither did your code. You didn't select from a 4 million rows table, you've generated pseudo-data and that's fine - but if you're going to be petty about the requirement, you should ask the OP to provide proper sample data so that we can use that in our answers. Still, the main benefit in my approach is the fact you don't have to randomly sort the records because they are already random. If that's even required? I don't know. – Zohar Peled Sep 23 '20 at 18:27
  • why don't you try it? because as it is, you are comparing two approaches/suggestions/answers that don't do the same thing (and the comparison becomes slightly pointless) – lptr Sep 23 '20 at 20:53
  • @lptr Well, I've tested it. Still my way is faster. – Zohar Peled Sep 25 '20 at 15:15
  • nice! if you edited it (the "select" test script) into your answer it would be just (or very close to) what the OP is looking for. – lptr Sep 26 '20 at 07:36
  • @lptr I'm not sure if the OP wants an update or a select statement - however, since you insist on `select` - I've edited with an example. – Zohar Peled Sep 26 '20 at 11:23