1

I am trying to insert data from a source table in to a temporary table using the NEWID() function so that I get a (fairly) random selection of lines from my source table.

Looking at the below code, I insert the data I need into the temp table #x and at Point 1 where I select from #x, it returns the data in a random order.

However, at Point 2 WHERE I am narrowing down the data from #x (I accumulate lines until a certain quantity is reached) the SELECT no longer returns random rows - it returns rows in a sequential order from the start of the table.

DROP TABLE IF EXISTS #x
CREATE TABLE #x (Id INT, Commodity VARCHAR(3), Quantity FLOAT, RowNum INT, TotalQuantity FLOAT)
INSERT INTO #x (id,commodity,quantity,rownum,totalquantity)
  SELECT
    i.id, i.commodity, i.quantity, ROW_NUMBER() OVER (ORDER BY i.id), SUM(i.quantity) OVER (ORDER BY i.id RANGE UNBOUNDED PRECEDING)
  FROM inventory i
  WHERE .........
        .........
  ORDER BY NEWID()


SELECT * FROM #x   -------- **POINT 1**


DECLARE @y INT = (SELECT MIN(rownum) AS minrownum FROM #x WHERE totalquantity >= @tonnes)


SELECT #x.id, #x.commodity, #x.quantity, #x.rownum, #x.totalquantity FROM #x
WHERE #x.rownum <= @y
ORDER BY NEWID()         -------- **POINT 2**

Any ideas on what I am missing?

Thanks.

Johnathan
  • 879
  • 3
  • 12
  • 22
  • Your query and question are confusing. Part of the question seems to want a random set of rows. However, the code is generating the first N rows that add to a particular value. – Gordon Linoff Jul 07 '16 at 15:37
  • I'm a bit confused myself Gordon. What I am trying to do is select random lines from a table until a certain cumulative quantity is reached. – Johnathan Jul 07 '16 at 15:44
  • FYI, Postgres 9.5 gained the [`TABLESAMPLE`](https://www.postgresql.org/docs/current/static/sql-select.html) command to nearly-randomly pull a sample subset of the rows. Discussed at end of [this Answer](http://stackoverflow.com/a/8675160/642706) to similar question. – Basil Bourque Jul 08 '16 at 02:28

1 Answers1

1

LOL. Your row number is deterministic. There may be better ways to do what you want, but you can fix the above code by randomizing the row number:

ROW_NUMBER() OVER (ORDER BY newid())

The outer ORDER BY is probably unnecessary.

Your query, though is quite confusing. It is selecting the first N rows (by id) that sum up to the total quantity. This makes a lot of sense. I'm not sure what all the randomization is for.

EDIT:

If you need to get random rows until a certain number is reached, you can do:

SELECT i.*
FROM (SELECT i.*, SUM(i.quantity) OVER (ORDER BY NEWID()) as cume_quantity
      FROM inventory i
      WHERE .........
            .........
     ) i
WHERE cume_quantity - quantity < @tonnes;

You don't need a temporary table. You don't need additional queries.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • It needs to select random rows unfortunately. The IDs relate to inventory and I can't give out the same numbers each time the query is executed. – Johnathan Jul 07 '16 at 15:39