How can I aggregate and collapse rows in a database table using SQL?

Question

I have a table with example data as shown below.

word       | last_seen  | first_seen | count
-----------|------------|------------|------
definition | 2014-09-08 | 2012-01-02 | 15
definition | 2014-10-11 | 2013-05-12 | 35
attribute  | 2013-07-23 | 2010-06-29 | 22

I'm wanting to to an in-place aggregation of the data, hopefully just using SQL, where the data for repeated words is such that I end up with MAX(last_seen), MIN(first_seen), and SUM(count).

word       | last_seen  | first_seen | count
-----------|------------|------------|------
definition | 2014-10-11 | 2012-01-02 | 50
attribute  | 2013-07-23 | 2010-06-29 | 22

I know I can see the results of the aggregation with the following:

SELECT 
  word, 
  MAX(last_seen) AS last_seen, 
  MIN(first_seen) AS first_seen, 
  SUM(count) AS count 
FROM 
  words 
GROUP BY word;

However, I don't just want to see the resulting aggregation... I want to actually update the words table, replacing the rows that have duplicate word column entries with the aggregated data.

I dont see why your code isnt giving you what you want. can you elaborate on `I just don't know how to update the words table in-place with the results` — Matt, Dec 10 '14 at 14:37
You have at least two statements, one to update, the other to delete the now irrelevant record. — crthompson, Dec 10 '14 at 14:43
Are you sure you want to update the existing table, or do you just want a view with the aggregated data? What if it changes again? — sgeddes, Dec 10 '14 at 14:44
You can do a combined select, update and delete in one query using the with clause: http://www.postgresql.org/docs/9.3/static/sql-select.html — Wolph, Dec 10 '14 at 15:04

score 1 · Accepted Answer · answered Dec 10 '14 at 14:57

As far as I'm aware there is no "Edit in place" in Postgresql (or any other traditional RDBMS that I can think of). Instead:

Take the results of your query and dump them into a temp table: CREATE TEMP TABLE <temptable> AS <Your Query> WITH DATA
Delete out everything in your word table: TRUNCATE word; <--This is the scary part so make sure you are cool with your query before truncating.
Insert the records in your temp table into the now empty word table: INSERT INTO word SELECT * FROM <temptable>;
Optionally: Drop your temp table DROP TABLE <temptable>; (being a temp table it will drop automagically when you end your session, but I'm a fan of being explicit)

score 0 · Answer 2 · edited May 23 '17 at 12:05

Actually you can do that in a single statement using data-modifying CTEs:

WITH del AS (
   DELETE FROM words w
   WHERE EXISTS (
      SELECT 1 
      FROM   words w1
      WHERE  w1.word = w.word
      AND    w1.ctid <> w.ctid
     )
   RETURNING *
   )
INSERT INTO words(word, last_seen, first_seen, count)
SELECT word, MAX(last_seen), MIN(first_seen), SUM(count)
FROM   del
GROUP  BY word;

Should be rather efficient, too.

SQL Fiddle.

About ctid:

In-order sequence generation

About CTEs:

Are SELECT type queries the only type that can be nested?

How can I aggregate and collapse rows in a database table using SQL?

2 Answers2