2

Possible Duplicate:
How to randomly select rows in SQL?

Suppose that -- in Teradata 12 -- I have a table TXN that has Transaction_ID and Store_nbr. Suppose also that I have a table SAMP that has Store_nbr and n_samples.

Suppose I want to take a sample of n_samples Transaction_IDs from TXN. I.e., I want the SAMP table to be able to drive how many transaction_id samples are taken from the TXN table. E.g.,

TXN
Transaction_ID Store_nbr
1              1
2              1
1              2
2              2
3              2

...and...

SAMP
store_nbr n_samples
1         2
2         2
3         1

So how can I tell SQL (this is Teradata, btw): Give me 2 samples of transaction_Id for store 1, 2 samples of transaction_Id for store 2, and 1 sample of transaction_Id for store 3?

Teradata does have the conditional sample concept:

select a,b
from foo
sample when a=1 then 5
       when a=2 then 10
       when a=3 then 7
end

...which I could use a codegen method to create if needed (there are thousands of stores).

The SAMP table is free to change at any time.

Community
  • 1
  • 1
Chris
  • 1,421
  • 3
  • 18
  • 31
  • I have edited your title. Please see, "[Should questions include “tags” in their titles?](http://meta.stackexchange.com/questions/19190/)", where the consensus is "no, they should not". – John Saunders Dec 29 '12 at 00:54
  • I don't know why this was closed, it's not at all similar to the other question. Perhaps it can be asked again? Technically the question should be "how can I do stratified random sampling in Teradata". I do this myself, but using SAS not "pure" Teradata. – BellevueBob Dec 29 '12 at 16:17

1 Answers1

0

You can do this using row_number and random():

select t.*
from (select t.*,
             row_number() over (partition by store_nbr
                                order by random(1,100000)) as seqnum
      from txn t
     ) t join
     samp s
     on t.store_nbr = s.store_nbr and
        t.seqnum <= s.n_samples

I'm not aware of a simple syntax using sample to accomplish this.

Out of curiousity, would this be allowed?

select t.*
from (select t.*,
             row_number() over (partition by store_nbr
                                order by rnd) as seqnum
      from (select t.*, random(1,100000) as rnd from txn t) t
     ) t join
     samp s
     on t.store_nbr = s.store_nbr and
        t.seqnum <= s.n_samples
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786