0

I need to join table B to Table A, where Table B's records are randomly assigned, or joined. Most of the queries out there are based off of having a key between them and conditions, where I just want to randomly join records without a key.

I'm not sure where to start, as none of the queries I've found are doing this. I assume a nested join could be helpful for this, but how can I randomly assort the records on join?

**Table A**
| Associate ID| Statement|
|:----:  |:------:|
| 33691| John is |
| 82451| Susie is |
| 25485| Sam is|
| 26582| Lonnie is|
| 52548| Carl is|

**Table B**
| RowID | List|
|:----:  |:------:|
| 1| admirable|
| 2| astounding|
| 3| excellent|
| 4| awesome|
| 5| first class|

The result would be something like this, where items from the list are not looped through in order, but random:

**Result Table**
| Associate ID| Statement| List|
|:----:  |:------:|:------:|
| 33691| John is |astounding|
| 82451| Susie is |first class|
| 25485| Sam is|admirable|
| 26582| Lonnie is|excellent|
| 52548| Carl is|awesome|

These are some of the queries I've tried:

xQbert
  • 34,733
  • 2
  • 41
  • 62
Allen
  • 56
  • 2
  • 14
  • 1
    Assign a row number based on a a random order to both tables. and then join on the row number generated – xQbert Dec 28 '21 at 15:53
  • How would that work when having undetermined records in Table A (1-10,000), and a small amount in Table B? (1-50) – Allen Dec 28 '21 at 15:55
  • Wait... So each record in A must be paird with a random record in B? or should there be an equal number of record from B paird with a random record in A? (meaning if there are only 50 records in B, it would be 50*X and not the case where 1 record in B could be 100* more than another... Put another way. Are you after a true random pairing for each record in A to B or 50 records from B should be repeated X times until all records in A could be accounted for and then joined to A... So it would be like count(A)/Count(B) +1 * all records in B randomized then joined to A. – xQbert Dec 28 '21 at 15:58
  • a recursive CTE would allow you to repeat the 50 records N times: https://stackoverflow.com/questions/33327837/repeat-rows-n-times-according-to-column-value getting the count as described above and then assigning a row number would work; but this assumes an even distribution of records from B... Otherwise you could cross apply to B order by Rand() and get top 1. So... are you after Even distribution from B, or truly a random distribution from B? well... as truely random as Rand() function gives you... – xQbert Dec 28 '21 at 16:09
  • Are you after a Random Distribution from B or an Even Distribution from B? (Assuming `Rand()` is good enough for Random) – xQbert Dec 28 '21 at 16:13

2 Answers2

0

using 2 CTEs we generate a select which generates a row number for each table based on a random order and then join based on that row number.

Using a CTE to get N times the records in B as described here: Repeat Rows N Times According to Column Value (Not included below) Note to get the "N" you'll need to get count from A and B, then divide by eachother and Add 1.

Assuming Even Distribution

With A as(
SELECT *, Row_number() over (order by NewID()) RN
FROM A),
B as (
SELECT *, Row_number () over (order by NewID()) RN
FROM B)
SELECT *
FROM A
INNER JOIN B
 on A.RN = B.RN

Or use (assuming uneven distribution)

SELECT * 
FROM A
CROSS APPLY (SELECT TOP 1 * FROM B ORDER BY NewID()) Z
xQbert
  • 34,733
  • 2
  • 41
  • 62
  • 1
    `rand()` is a runtime constant and the same for all rows so `Row_number() over (order by rand())` doesn't apply any randomisation at all – Martin Smith Dec 28 '21 at 16:23
  • Hm... I guess I forot that NEWID then. though I'm not sure about performance... and duplication is possible... I guess the right method is: https://dba.stackexchange.com/questions/955/what-is-the-best-way-to-get-a-random-ordering – xQbert Dec 28 '21 at 16:27
0

This method assumes you know in advance which is the smaller table.

First it assigns an ascending row numbering from 1. This does not have to be randomized.

Then for each row in the larger table it uses the modulus operator to randomly calculate a row number in the range to join onto.

WITH Small
     AS (SELECT *,
                ROW_NUMBER() OVER ( ORDER BY (SELECT 0)) AS RN
         FROM   SmallTable),
     Large
     AS (SELECT *,
                1 + CRYPT_GEN_RANDOM(3) % (SELECT COUNT(*) FROM SmallTable) AS RND
         FROM   LargeTable
         ORDER  BY RND
         OFFSET 0 ROWS)
SELECT *
FROM   Large
       INNER JOIN Small
               ON Small.RN = Large.RND 

The ORDER BY RND OFFSET 0 ROWS is to get the random numbers materialized in advance.

This will allow a MERGE join on the smaller table. It also avoids an issue that can sometimes happen where the CRYPT_GEN_RANDOM is moved around in the plan and only evaluated once rather than once per row as required.

Martin Smith
  • 438,706
  • 87
  • 741
  • 845