0

This is how I do it currently:

DECLARE tmp message%ROWTYPE;
BEGIN;
SELECT * INTO tmp FROM [...]
DELETE FROM message m WHERE m.id = tmp.id;
END;

I'm afraid that the db will do two queries here: One for doing the SELECT and one for the DELETE. In case this is true - can I make this more efficient somehow? After all the row that should be deleted was already found in the SELECT query.

N.b. I'm eventually storing something from the SELECT query and return it from the function. The above is just simplified.

ben
  • 5,671
  • 4
  • 27
  • 55

2 Answers2

1
delete from message m
using (
    select *
    from ...
) s
where m.id = s.id
returning s.*
Clodoaldo Neto
  • 118,695
  • 26
  • 233
  • 260
  • This works but I don't quite understand what the "using" part is doing since in the postgres documentation it is mentioned that "USING is a shorthand notation: it takes a comma-separated list of column names, which the joined tables must have in common, and forms a join condition specifying equality of each of these pairs of columns." (http://www.postgresql.org/docs/9.2/static/queries-table-expressions.html) But you are passing a query to "USING" instead of columns. – ben Oct 12 '14 at 20:28
  • 1
    @ben The `using` you mention is for joins. This one is a list of table expressions: http://www.postgresql.org/docs/current/static/sql-delete.html – Clodoaldo Neto Oct 13 '14 at 00:06
1

For simple cases you do not even need a subquery.

If your secret first query only involves the same table:

DELETE FROM message m
WHERE  m.id = <something>
RETURNING m.*;          -- or what you need from the deleted row.

If your secret first query involves one or more additional tables:

DELETE FROM message m
USING  some_tbl s
WHERE  s.some_column = <something>
AND    m.id = s.id
RETURNING m.id, s.foo;  -- you didn't define ...

Solution for actual query (after comments)

An educated guess, to delete the oldest row (smallest timestamp) from each set with identical id:

DELETE FROM message m
USING (
   SELECT DISTINCT ON (id)
          id, timestamp
   FROM   message
   WHERE  queue_id = _queue_id
   AND    source_client_id = _source_client_id
   AND   (target_client_id IN (-1, _client_id))
   ORDER  BY id, timestamp
   ) sub
WHERE m.id = sub.id
AND   m.timestamp = sub.timestamp
RETURNING m.content
INTO  rv;

Or, if (id, timestamp) is UNIQUE, NOT EXISTS is probably faster:

DELETE FROM message m
WHERE  queue_id = _queue_id
AND    source_client_id = _source_client_id
AND    target_client_id IN (-1, _client_id)
AND    NOT EXISTS (
   SELECT 1
   FROM   message
   WHERE  queue_id = _queue_id
   AND    source_client_id = _source_client_id
   AND    target_client_id IN (-1, _client_id)
   AND    id = m.id
   AND    timestamp < m.timestamp
   ) sub
WHERE  m.id = sub.id
RETURNING m.content
INTO   rv;

More about DISTINCT ON and selecting "the greatest" from each group:

If performance is your paramount objective, look to the last chapter ...

Aside: timestamp is a basic type name in Postgres and a reserved word in standard SQL. Don't use it as identifier.

Solution in comment below, audited:

DELETE FROM message m
USING (
   SELECT id
   FROM   message
   WHERE  queue_id = _queue_id
   AND    target_client_id IN (client_id, -1)
   ORDER  BY timestamp
   LIMIT  1
   ) AS tmp
WHERE  m.id = tmp.id
RETURNING m.content
INTO rv;

INTO ... only makes sense inside a plpgsql function, of course.
An index on (queue_id, timestamp) makes this fast - possibly even a partial index with the condition WHERE target_client_id IN (client_id, -1) (depends).

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • The query does only involve the message table only indeed. However it is structured like so `DELETE FROM message WHERE id IN (SELECT DISTINCT ON (timestamp) id FROM message WHERE message.queue_id = _queue_id AND message.source_client_id = _source_client_id AND (message.target_client_id = -1 OR message.target_client_id = _client_id) ORDER BY timestamp ASC) RETURNING content INTO rv;` I don't think one can do without the subquery in the WHERE clause. – ben Oct 16 '14 at 06:21
  • @ben: Right, this needs a subquery. However, `SELECT DISTINCT ON (timestamp) id ... ORDER BY timestamp` doesn't make sense. Can you add the query you have now and explanation what it's supposed to do *exactly* to your question? Which ID from each set of identical timestamps do you want to delete? – Erwin Brandstetter Oct 16 '14 at 06:32
  • @ben: A typical query would be to delete "the oldest row per `id`". Is that what you actually want? – Erwin Brandstetter Oct 16 '14 at 06:45
  • Almost. `id` is unique. I want to delete the oldest row which satisfies the WHERE condition. – ben Oct 19 '14 at 11:48
  • Whoops I just realized that `DISTINCT ON` hardly makes any sense on a unique field. I started over and came up with: `DELETE FROM message m USING (SELECT * FROM message WHERE message.queue_id = _queue_id AND (message.target_client_id = client_id OR message.target_client_id = -1) ORDER BY timestamp ASC LIMIT 1) AS tmp WHERE m.id = tmp.id RETURNING m.content INTO rv;` Sorry that I did not give you sufficient information right away. Your help is much appreciated. – ben Oct 19 '14 at 16:06
  • @ben: Yes, for a single row, `ORDER BY .. LIMIT 1` makes sense now - and is typically fast with index support. I added an audited version to my answer and a bit about indexes. – Erwin Brandstetter Oct 19 '14 at 16:40
  • Wow I even had the index you proposed already created! At least one thing I did not do wrong :) Your help and your answer is great. I changed the accepted answer to yours now since it is more extensive - even without the parts which are specific to my query. – ben Oct 19 '14 at 20:43