0

I want to write a select query that selects distinct rows of data progressively. Explaining with an example, Say i have 5000 accounts selected for repayment of loan, these accounts are ordered in descending order( Account 1st has highest outstanding while account 5000nd will have the lowest).

I want to select 1000 unique accounts 5 times such that the total outstanding amount of repayment in all 5 cases are similar.

i have tried out a few methods by trying to select rownums based on odd/even or other such way, but it's only good for upto 2 distributions. I was expecting more like a A.P. as in maths that selects data progressively.

MT0
  • 143,790
  • 11
  • 59
  • 117
  • Look at https://asktom.oracle.com/pls/apex/asktom.search%3Ftag%3Dgetting-rows-n-through-m-of-a-result-set, however if your previously ordered your data won't be evenly distributed. – Pepe N O Feb 08 '23 at 15:18
  • @PepeNO The OP appears to be asking a bin packing problem (wanting the 5 sets of 1000 items to be of equal sizes) rather than a row limiting problem (i.e. `OFFSET n ROWS FETCH NEXT 1000 ROWS ONLY`). – MT0 Feb 08 '23 at 15:41

1 Answers1

0

A naïve method of splitting sets into (for example) 5 bins, numbered 0 to 4, is give each row a unique sequential numeric index and then, in order of size, assign the first 10 rows to bins 0,1,2,3,4,4,3,2,1,0 and then repeat for additional sets of 10 rows:

WITH indexed_values (value, rn) AS (
  SELECT value,
         ROW_NUMBER() OVER (ORDER BY value DESC) - 1
  FROM   table_name
),
assign_bins (value, rn, bin) AS (
  SELECT value,
         rn,
         CASE WHEN MOD(rn, 2 * 5) >= 5
         THEN 5 - MOD(rn, 5) - 1
         ELSE MOD(rn, 5)
         END
  FROM   indexed_values
)
SELECT bin,
       COUNT(*) AS num_values,
       SUM(value) AS bin_size
FROM   assign_bins
GROUP BY bin

Which, for some random data:

CREATE TABLE table_name ( value ) AS
SELECT FLOOR(DBMS_RANDOM.VALUE(1, 1000001)) FROM DUAL CONNECT BY LEVEL <= 1000;

May output:

BIN NUM_VALUES BIN_SIZE
0 200 100012502
1 200 100004633
2 200 99980342
3 200 99976774
4 200 100005756

It will not get the bins to have equal values but it is relatively simple and will get a close approximation if your values are approximately evenly distributed.


If you want to select values from a certain bin then:

WITH indexed_values (value, rn) AS (
  SELECT value,
         ROW_NUMBER() OVER (ORDER BY value DESC) - 1
  FROM   table_name
),
assign_bins (value, rn, bin) AS (
  SELECT value,
         rn,
         CASE WHEN MOD(rn, 2 * 5) >= 5
         THEN 5 - MOD(rn, 5) - 1
         ELSE MOD(rn, 5)
         END
  FROM   indexed_values
)
SELECT value
FROM   assign_bins
WHERE  bin = 0

fiddle

MT0
  • 143,790
  • 11
  • 59
  • 117