0

I'm new to SSIS and complex sql statements. I'm trying to combine my query that returns people with a date in the last month, but then take those results and get 5 truly random rows from that.

My query before I get the 5 random rows is:

SELECT 
    DISTINCT
    isnull(dbo.fnRemovePatternFromString(p.Last_Name, '%[,-.'']%'), '')  as [Last]
    ,isnull(dbo.fnRemovePatternFromString(p.First_Name, '%[,-.'']%'), '') as [First]
    ,isnull(dbo.fnRemovePatternFromString(p.Middle_Initial, '%[,-.'']%'), '') as [Middle]
    ,isnull(pf.Date_on_staff, '') as [Date on Staff]
    ,pf.Status_from_date
FROM person p
inner JOIN person_facilities pf ON p.Person_ID = pf.Person_ID
LEFT JOIN usr_FacultyMember fm ON p.Person_ID = fm.person_id


WHERE 
    pf.FacCode in ('s', 'H', 'E')
and 
    (
      pf.Status_from_date >= (getdate()-31 ) 
      and pf.Status_from_date < getdate()
    )

That returns about 300 rows, but it will change each month.

From that result, I need to get 5 random rows out of the results. It can't be the top of the results, or the 5 rows at 1/5 of the results. I'd like to avoid creating a view to store the results in like this example.

Tablesample doesn't look like what I need since it's not random.

I'm trying to apply this example, since it seems random, but I need to control the number of rows in the output. This has a similar answer. This is the part I like from those 2 examples:

(abs(cast((binary_checksum(*) * rand()) as int)) % 100) <10

I'm not sure how to control the output so I only get 5 rows from that. I was thinking of creating a variable and storing what the 10 needs to be to return 5 rows, but I'm not sure how to do that. Any thoughts?

I was looking at variable for count of rows but I'm not sure what I'd do with it or how I'd apply it in my rand line.

I guess it doesn't have to be one sql query for this, but I'm still not sure how to do this in ssis. If I have to create a view, I can, but I'd rather not clutter my db with views.

Any help solving this complicated question would be greatly appreciated.

Update: This is different than random-number-on-sql-without-using-newid because I'm not ruling out order by newid, I'm just trying to find a random 5 rows returned.

Michele
  • 3,617
  • 12
  • 47
  • 81
  • Possible duplicate of [Random Number on SQL without using NewID()](https://stackoverflow.com/questions/2247554/random-number-on-sql-without-using-newid) – Himanshu Sep 20 '19 at 14:59

2 Answers2

1

There could be multiple logic's to implement some randomization I thought of a modulo on column id/dates.

 SELECT 
        DISTINCT
        isnull(dbo.fnRemovePatternFromString(p.Last_Name, '%[,-.'']%'), '')  as [Last]
        ,isnull(dbo.fnRemovePatternFromString(p.First_Name, '%[,-.'']%'), '') as [First]
        ,isnull(dbo.fnRemovePatternFromString(p.Middle_Initial, '%[,-.'']%'), '') as [Middle]
        ,isnull(pf.Date_on_staff, '') as [Date on Staff]
        ,pf.Status_from_date
    FROM person p
    inner JOIN person_facilities pf ON p.Person_ID = pf.Person_ID
    LEFT JOIN usr_FacultyMember fm ON p.Person_ID = fm.person_id


    WHERE 
        pf.FacCode in ('s', 'H', 'E') order by MOD( sysdate-date,2 )
Himanshu
  • 3,830
  • 2
  • 10
  • 29
0

Why not just use top/fetch and order by newid()?

select top (5) t.*
from (<your query here>) t
order by newid();
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Because the requirement is the rows returned need to be random. Top won't give random rows. I think order by newid just takes the top 5 from the select and orders them differently, right? – Michele Sep 20 '19 at 14:54
  • I was reading more about newid, and it does look like it's returning 5 random rows from the output in the description. Thanks! – Michele Sep 20 '19 at 15:06