0

I have a data set, where we have 100,000 customer. The sum of customers revenue is 2 million.

Now, I need 4 batches from 100,000 customer i.e. 25000 each which sum to revenue of 0.5 million.

Please can you help me sql script for this

Aadil Anis
  • 1
  • 1
  • 3

2 Answers2

0

In order to extract batches you can use CTE with ROW_NUMBER() :

with cte as 
( select field1, field2, row_number() over(order by customerid) as rownum
from table)
select * from cte where rownum < 25000

Please note that this will not guarantee that the amount of revenue is 0.5

You can check the revenue by adding

 with cte as 
    ( select field1, field2, row_number() over(order by customerid) as rownum
    from table)
select sum(revenue) from cte
where rownum < 25000
0

Your best bet is probably just to put the customers into four randomized groups. With the numbers you are talking about, the values will be pretty close:

select t.*,
       ntile(4) over (order by newid()) as grp
from t;

A more canonical answer is to do a stratified sample. In general, this will do a very good job:

select t.*,
       (row_number() over (order by revenue) % 4) as grp
from t;

This is not perfect. However, there is no perfect answer to your question. After all, one customer could have half the revenue, making it impossible to answer your question with equal-sized and equal-revenue bins.

The stratification method can actually be refined for continuous values. The idea is to break the customers into n/4 bins based on revenue and then choose randomly in each group:

select t.*,
       (row_number() over (partition by bin order by newid()) % 4 as grp
from (select t.*,
             (row_number() over (order by revenue) / 4) as bins
      from t
     ) t;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786