0

SQL Server's SQL has become so clever that what looks like it might require a procedural solution can often be done with pure SQL. I am wondering if this is one of those times.
Let's say we have a STATES table and a CITIES table.

STATES:
State:  NY

CITIES
State: NY
City:  Armonk

Now let's complicate things with a third table: INSTRUCTIONS

INSTRUCTIONS
State: NY
HowMany: 17

State: NJ
HowMany: 11

Is there any way in SQL Server SQL to select HowMany cities at random from the CITIES table when the three tables are joined on State?

We don't know the "top N" in advance. It changes by state.

Of course, the States table will have all the 50 states, the Cities table all the cities in each state, and Instructions will have one record per state, identifying how many cities from that state are needed (chosen randomly).

P.S. Sample desired results (assuming the Instruction for NY is HowMany=5 and the Instruction for NJ is HowMany = 4, and order by STATES.state):

NJ.....Princeton
NJ.....Newark
NJ.....Camden
NJ.....Princeton
NY.....Armonk
NY.....Schenectady
NY.....White Plains
NY.....Niagara Falls
NY.....Rochester
Tim
  • 8,669
  • 31
  • 105
  • 183

3 Answers3

1
with
  states as (
    select 'NY' state union
    select 'NJ' state
  ),
  instructions as (
    select 'NY' state, 2 howmany union
    select 'NJ' state, 3 howmany
  ),
  cities as (
    select 'NJ' state,'Princeton' city union
    select 'NJ' state,'Newark' city union
    select 'NJ' state,'Camden' city union
    select 'NJ' state,'Hamilton' city union
    select 'NY' state,'Armonk' city union
    select 'NY' state,'Schenectady' city union
    select 'NY' state,'White Plains' city union
    select 'NY' state,'Niagara Falls' city union
    select 'NY' state,'Rochester' city
  ),
  cities_rnd as (
    select c.*,rand() rnd from cities c
  ),
  cities_ranked as (
    select c.*, dense_rank() over (partition by c.state order by c.rnd) rank from cities_rnd c
  )
select c.*,i.howmany
from cities_ranked c
join instructions  i on i.state=c.state
join states        s on s.state=c.state --needless line
where c.rank <= i.howmany;
Kjetil S.
  • 3,468
  • 20
  • 22
  • Error: `The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP, OFFSET or FOR XML is also specified.` – Tim Jul 20 '17 at 12:36
1

I find the RAND() function used in the other answers causes some problems by not being a new random number for each row.

CHECKSUM(NEWID()) worked well for me in this scenario. (See RAND not different for every row in T-SQL UPDATE)

I think this solution is nice and tidy:

SELECT
RandomCities.[State]
,[RandomCities].City
FROM
    (
        SELECT 
        s.[state]
        ,city
        ,ROW_NUMBER() OVER (PARTITION BY s.[State] ORDER BY CHECKSUM(NEWID())) AS [RandomOrder]
        FROM
        States s
        INNER JOIN Cities c ON c.[state]=s.[state]
    ) AS RandomCities
INNER JOIN instructions i ON i.[state]=RandomCities.[state]
WHERE RandomCities.RandomOrder<=i.HowMany
Max xaM
  • 348
  • 2
  • 12
0

DIfferent then the other answer using a query that does not use DDL (Data Definition Language)

SET @row_num2= 0;
SELECT *,@row_num2 := @row_num2+1 as rownum2 FROM (SELECT States.State,Cities.City,Instructions.HowMany
from States,Cities, Instructions
WHERE States.State = Cities.State and States.State = Instructions.State
ORDER BY RAND()) as t HAVING rownum2 >= t.HowMany

http://sqlfiddle.com/#!9/b96d3b/37

Joshua Klein
  • 189
  • 2
  • 13