0

I'm working in SQL Server 2012. I'm cleaning data, and am trying to break ties in an address-assigning algorithm randomly. This requires me to generate random numbers. The approach taken by this answer doesn't work because the WITH subquery seems to regenerate each side of the join, resulting in different random numbers for the same row.

This code replicates the problem:

WITH
initial_table AS (
    SELECT 1 AS id
),
random_values_added AS (
  SELECT id, ABS(CHECKSUM(NEWID())) % 10 AS rand
  FROM initial_table
)
SELECT t1.id, t2,id, t1.rand, t2.rand
FROM random_values_added t1 FULL OUTER JOIN random_values_added t2
ON t1.id = t2.id AND t1.rand = t2.rand
    
Wilbur
  • 457
  • 1
  • 5
  • 14

3 Answers3

1

As the CTE is run dynamically, when you call the join condition, you can store the intermediate results in a temporary table and leverage it for your query.

WITH
initial_table AS (
    SELECT 1 AS id
    union all
    SELECT 2
),
random_values_added AS (
  SELECT id, newid() AS rand
  FROM initial_table
)
select * into #RANDTable FROM random_values_added


SELECT t1.id, t2.id, t1.rand, t2.rand
FROM #RANDTable t1 
FULL OUTER JOIN #RANDTable t2
ON t1.id = t2.id AND t1.rand = t2.rand

+----+----+--------------------------------------+--------------------------------------+
| id | id |                 rand                 |                 rand                 |
+----+----+--------------------------------------+--------------------------------------+
|  1 |  1 | 3E366071-CD5A-4B1F-A043-831FE13B7102 | 3E366071-CD5A-4B1F-A043-831FE13B7102 |
|  2 |  2 | CC53C2F8-1C06-4BD8-8111-7133B45F4B79 | CC53C2F8-1C06-4BD8-8111-7133B45F4B79 |
+----+----+--------------------------------------+--------------------------------------+

Venkataraman R
  • 12,181
  • 2
  • 31
  • 58
1

You can use a trick using rand(). Assuming that id is unique, you can start with:

WITH initial_table AS (
      SELECT 1 AS id
     ),
     random_values_added AS (
      SELECT id, FLOOR(RAND(id) * 10) AS rand
      FROM initial_table
    )

This is not totally satisfying because it returns the same value every time the code is run. For that, we can use the fact that getdate() is evaluated once per query:

WITH initial_table AS (
      SELECT 1 AS id
     ),
     random_values_added AS (
      SELECT id, FLOOR(RAND(DATEADD(second, id, GETDATE()) * 10) AS rand
      FROM initial_table
    )
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
0

Can you please try removing the self join?

  SELECT id, ABS(CHECKSUM(NEWID())) % 10 AS rand
  FROM your_table 

Can you also try the rand() function if it works in SQL Server 2012?

Dale K
  • 25,246
  • 15
  • 42
  • 71
Koushik Roy
  • 6,868
  • 2
  • 12
  • 33