11

Given this table:

# select * from messages;
id verbosity
1 20
2 20
3 20
4 30
5 100

(5 rows) I would like to select N messages for which sum of verbosity is lower than N. So if N = 70, the desired result will be messages with id 1, 2 and 3. It's important the solution is database-independent. It should work at least on PostgreSQL and SQLite.

Something like:

SELECT * FROM messages GROUP BY id HAVING SUM(verbosity) < 70;

doesn't sum all values from the verbosity column.

user4157124
  • 2,809
  • 13
  • 27
  • 42
user1105595
  • 591
  • 2
  • 8
  • 20
  • 2
    Imagine if you have 10 rows all with `verbosity=20`, which three of them should the database report as the result of your SELECT? It seems like this query is vague in its current form. Maybe you need to add more criteria. For example, if you have `1: 29, 2: 30, 3: 20, 4: 20, 5: 10`, should the database give you `1, 2`? Should it give `1, 3, 4`? Or maybe `1, 2, 5`? – Shahbaz Jul 27 '12 at 13:49
  • Why are you expecting only 1,2,3 - why not 4? – Dave Richardson Jul 27 '12 at 13:52

2 Answers2

25
SELECT m.id, sum(m1.verbosity) AS total
FROM   messages m
JOIN   messages m1 ON m1.id <= m.id
WHERE  m.verbosity < 70    -- optional, to avoid pointless evaluation
GROUP  BY m.id
HAVING SUM(m1.verbosity) < 70
ORDER  BY total DESC
LIMIT  1;

This assumes a unique, ascending id like you have in your example.


In modern Postgres - or generally with modern standard SQL (but not in SQLite):

Simple CTE

WITH cte AS (
   SELECT *, sum(verbosity) OVER (ORDER BY id) AS total
   FROM   messages
   )
SELECT *
FROM   cte
WHERE  total < 70
ORDER  BY id;

Recursive CTE

Should be faster for big tables where you only retrieve a small set.

WITH RECURSIVE cte AS (
   (  -- parentheses required
   SELECT id, verbosity, verbosity AS total
   FROM   messages
   ORDER  BY id
   LIMIT  1
   )

   UNION ALL 
   SELECT c1.id, c1.verbosity, c.total + c1.verbosity 
   FROM   cte c
   JOIN   LATERAL (
      SELECT *
      FROM   messages
      WHERE  id > c.id
      ORDER  BY id
      LIMIT  1
      ) c1 ON  c1.verbosity < 70 - c.total
   WHERE c.total < 70
   )
SELECT *
FROM   cte
ORDER  BY id;

All standard SQL, except for LIMIT.

Strictly speaking, there is no such thing as "database-independent". There are various SQL-standards, but no RDBMS complies completely. LIMIT works for PostgreSQL and SQLite (and some others). Use TOP 1 for SQL Server, rownum for Oracle. Here's a comprehensive list on Wikipedia.

The SQL:2008 standard would be:

...
FETCH  FIRST 1 ROWS ONLY

... which PostgreSQL supports - but hardly any other RDBMS.

The pure alternative that works with more systems would be to wrap it in a subquery and

SELECT max(total) FROM <subquery>

But that is slow and unwieldy.

db<>fiddle here
Old sqlfiddle

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • @Erwin Would you mind to translate the recursive CTE query into SQL Server? there are some more syntax changes required rather than just the TOP 1, like LATERAL and RECURSIVE words in the statements. – Najera Dec 31 '18 at 20:03
1

This will work...

select * 
from messages
where id<=
(
    select MAX(id) from
    (
        select m2.id, SUM(m1.verbosity) sv 
        from messages m1
        inner join messages m2 on m1.id <=m2.id
        group by m2.id
    ) v
    where sv<70
)

However, you should understand that SQL is designed as a set based language, rather than an iterative one, so it designed to treat data as a set, rather than on a row by row basis.

podiluska
  • 50,950
  • 7
  • 98
  • 104