1

My question is an extension of the question asked here: How to request a random row in SQL?

What is the extension? Let's say is given a table of users, with this structure:

id (int) | name | email | phone number (char)

If I want to select few random users I have this query:

SELECT TOP @X column FROM table ORDER BY NEWID()

The extension I need is to give priority to users that have phone number (NOT NULL). The users should be randomly selected, but the one with phone number (if any) should come first, and the total number of returned rows should still be @X.

Thank you.

GMB
  • 216,147
  • 25
  • 84
  • 135
Zelter Ady
  • 6,266
  • 12
  • 48
  • 75

3 Answers3

3

You can add a specific component to the order by clause that puts the appropriate phone numbers first e.g.,

SELECT TOP @X column 
FROM table 
ORDER BY 
    CASE WHEN [Phone Number] IS NOT NULL THEN 0 ELSE 1 END,
    NEWID();

This sorts all those with phone numbers in one bucket (0) and those without phone numbers into another bucket (1). They are then still randomised within that bucket.

seanb
  • 6,272
  • 2
  • 4
  • 22
  • 1
    You have accepted my answer here, but you may want to review. This one provides a different answer than @GMB's. On re-reading the question, GMB's probably fits your question better. The difference is easily demonstrated with an example. Say your dataset has 20 rows with phone numbers and 20 rows without, and you are selecting the top 10. GMB's will (probably) have a mix of those with phone numbers and without, but sorted with phone number first; mine would pull out 10 with phone numbers and none without. – seanb Oct 26 '20 at 00:05
  • Your answer is the solution I'm looking for. I need to select first all the users with phone numbers, randomly, then, if not enough (less than @X) add more random rows from users without phone number. Thank you. – Zelter Ady Oct 26 '20 at 00:15
2

If I follow this correctly, you want two levels of sorting:

select column
from (select top @x * from table order by newid()) t
order by case when phone_number is not null then 0 else 1 end

The subquery selects @x random rows, then the outer query puts rows whose phone_number is not null first (if any).

GMB
  • 216,147
  • 25
  • 84
  • 135
  • Your answer doesn't respond to request. Your query selects random rows and sort them by my citerea. But the request was to select random rows with phone not null, and if not enough, add more rows, random, from those without phone number. I think @seanb posted the correct answer. – Zelter Ady Oct 26 '20 at 00:13
  • 1
    @ZelterAdy: if that's what you want, then seanb's query does what you want indeed, and not mine. I had a different understanding of your question, which seanb actually explained very well as a comment to their answer. – GMB Oct 26 '20 at 00:15
  • 1
    As I said in the comment on my answer - @GMB and I understood the question in different ways, and I think (upon re-reading the question), that this was based on a better reading. This answer then gives an appropriate answer to that. So yeah, I'm upvoting this one, even though it's not actually what the OP finally needed. – seanb Oct 26 '20 at 00:28
0
SELECT TOP(3)
      *
FROM (
    VALUES
    (1001, 'fred'  , '999 999 999'),
    (2001, 'barney', '888 888 888'),
    (3001, 'betty' , ''),
    (4001, 'wilma' , '')
    ) driver( id, name, phone)
cross apply (
    SELECT
          BINARY_CHECKSUM(phone) x1
        , BINARY_CHECKSUM(id, name, phone, NEWID()) x2
        , ABS(BINARY_CHECKSUM(BINARY_CHECKSUM(phone), NEWID())) x3
        ) ca (x1, x2, x3)
ORDER BY x3 DESC
  id | name  | phone       |        x1 |          x2 |         x3
---: | :---- | :---------- | --------: | ----------: | ---------:
3001 | betty |             |         0 | -1025202521 | 2016765040
1001 | fred  | 999 999 999 | 984285186 |  1620969732 | 1872682014
4001 | wilma |             |         0 | -1149900698 | 1518182428

db<>fiddle here

Please try running the sample multiple times to see the different results.

Using ABS() and BINARY_CHECKSUM() with differing columns allows you to arrive at ways to alter the order to affect the randomness.

Note there is a strong performance cost when ordering by newid(0 see: https://learn.microsoft.com/en-us/previous-versions/software-testing/cc441928(v=msdn.10)

Paul Maxwell
  • 33,002
  • 3
  • 32
  • 51