Random Variable Selection in SQL

Question

I have seen a few questions like this, but nothing has answered what I'm looking for.

I have 5,000 rows of data from over 3 years. Every line has a memberID, so memberIDs repeat and are only unique to an individual (but they will repeat in the column if the individual is in the system multiple times over 3 years).

How can I pull 100 random memberIDs over the course of 3 years? (So theoretically there would be more than 100 lines because memberIDs can repeat)

EDIT: I should clarify, Member ID is character, not numeric. Ex: W4564

NOTE: This is NOT looking for n rows, rather 100 different IDs over the course of 3 years, so an ID might be associated with 3 rows in the result. The result will have a differing number of rows each time the SQL is run.

possible duplicate of [Select n random rows from SQL Server table](http://stackoverflow.com/questions/848872/select-n-random-rows-from-sql-server-table) — Holmes IV, Jul 13 '15 at 18:34
I think the answers provided at the http://stackoverflow.com/questions/848872/select-n-random-rows-from-sql-server-table link can clarify you for getting the correct answer — Roberto Tellez Ibarra, Jul 13 '15 at 18:39

dub stylee · Answer 1 · 2015-07-13T18:52:08.227

2

Depending on how your data is indexed, you could simply grab the rows with the memberID from a subquery. For example:

SELECT *
FROM <yourtable>
WHERE memberID IN (SELECT DISTINCT TOP 100 memberID FROM <yourtable>)

That should return random memberIDs, depending on your index. If you need to force it, you can do like in the linked question in the comments, and sort it randomly:

SELECT *
FROM <yourtable>
WHERE memberID IN (SELECT DISTINCT TOP 100 memberID FROM <yourtable> ORDER BY newid())

edited Jul 13 '15 at 18:52

answered Jul 13 '15 at 18:36

dub stylee

3,252
5
38
59

So close. But this won't get 100 *distinct` values of `memberId`. Hint: you need to do something in case there are duplicates in the "top 100". – Gordon Linoff Jul 13 '15 at 18:46
True, I suppose that `SELECT DISTINCT TOP 100` would be more appropriate :) – dub stylee Jul 13 '15 at 18:51
But would I want distinct if the IDs are allowed to repeat (I essentially want all interactions associated with a member ID over a 3 year period) – Mere Jul 13 '15 at 20:06
@Mere - yes, because the subquery is just getting you the 100 IDs, the outer query will still pick multiple rows per ID if there are multiples for each ID. Without the `DISTINCT`, you could end up with anywhere from 1-100 member IDs, and your end result may have less than 100 rows. – dub stylee Jul 13 '15 at 20:56
Great. Please be sure to upvote and/or accept an answer if you found it helpful. – dub stylee Jul 14 '15 at 22:19

GolezTrol · Answer 2 · 2015-07-13T18:48:09.303

Using order by newid() you can use a random sorting. Using where exists you can isolate only those members for which data exists in the past three years. You need to do that in this stage, otherwise you might accidentally end up with only members that don't have any recent data at all. By adding top 100 you can select just 100 rows out of the set.

The combination should get the 100 random member ids for which data exists in the past three years:

select top 100
  m.MemberID
from
  Member m
where
  exists (select 'x' 
          from MemberData d 
          where d.MemberId = m.MemberId
                and d.DataDate > dateadd(year, -3, getdate()))
order by 
  newid()

Then you could use that query in an in clause to get data from the same MemberData table, or any other table for that matter:

select
  md.*
from
  MemberData md
where
  -- Same filter to get only the recent data
  md.DataDate > dateadd(year, -3, getdate()) and
  -- Only of 100 random members that have been active in the past 3 years.
  md.MemberId in (
    select top 100
      m.MemberID
    from
      Member m
    where
      exists (select 'x' 
              from MemberData d 
              where d.MemberId = m.MemberId
                    and d.DataDate > dateadd(year, -3, getdate()))
    order by 
      newid()
  )

Random Variable Selection in SQL

2 Answers2