Aggregate text based on a criteria

Question

On a previous question I asked a similar question that relied on a helper table to be a part of the criteria for splitting the data. It seems that my current goal is easier, but I couldn't figure it out.

Given the table:

CREATE TABLE conversations (id int, record_id int, is_response bool, text text);
INSERT INTO conversations VALUES
  (1,  1,  false, 'in text 1')
, (2,  1,  true , 'response text 1')
, (3,  1,  false, 'in text 2')
, (4,  1,  true , 'response text 2')
, (5,  1,  true , 'response text 3')
, (6,  2,  false, 'in text 1')
, (7,  2,  true , 'response text 1')
, (8,  2,  false, 'in text 2')
, (9,  2,  true , 'response text 2')
, (10, 2,  true , 'response text 3');

I would like to aggregate the text based on the is_response value and output the following:

 record_id | aggregated_text                                   |
 ----------+---------------------------------------------------+
 1         |in text 1 response text 1                          |
 ----------+---------------------------------------------------+
 1         |in text 2 response text 2 response text 3          |
 ----------+---------------------------------------------------+
 2         |in text 1 response text 1                          |
 ----------+---------------------------------------------------+
 2         |in text 2 response text 2 response text 3          |

I've tried the following query, but it fails to aggregate two responses in a row, IE :is_response is true in a sequence.

SELECT
    record_id,
    string_agg(text, ' ' ORDER BY id) AS aggregated_text
FROM (
    SELECT
        *,
        coalesce(sum(incl::integer) OVER (ORDER BY id ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING),0) AS grp
    FROM (
        SELECT *, is_response as incl
        FROM conversations
         ) c
     ) c1
GROUP BY record_id, grp
HAVING bool_or(incl)
ORDER BY max(id);

The output of my query just adds another line for the following is_response row like so:

 record_id | aggregated_text                                   |
 ----------+---------------------------------------------------+
 1         |in text 1 response text 1                          |
 ----------+---------------------------------------------------+
 1         |in text 2 response text 2                          |
 ----------+---------------------------------------------------+
 1         |response text 3                                    |
 ----------+---------------------------------------------------+
 2         |in text 1 response text 1                          |
 ----------+---------------------------------------------------+
 2         |in text 2 response text 2                          |
 ----------+---------------------------------------------------+
 2         | response text 3                                   |
 ----------+---------------------------------------------------+

How can I fix it?

Thanks for your reply, can you help out with a code example? — Shlomi Schwartz, Sep 26 '16 at 16:07
Don't have time right now. Just wanted to offer a pointer for something to look into. — jpmc26, Sep 26 '16 at 16:18
@Erwin, thank you so much for your recent help! The new questions is not exactly the same (no responses table) and I couldn't draw attention to the previous question. I can see your response there, however... Unfortunately your query does not produce the desired output. — Shlomi Schwartz, Sep 26 '16 at 20:34
Ok, I reopened. It's just a simpler version of your last question, though. [My answer over there](http://stackoverflow.com/a/39643517/939860) produces *exactly* the requested result. — Erwin Brandstetter, Sep 26 '16 at 21:09
I took the liberty to provide your test table as `CREATE TABLE` script. That's the preferred form, makes answering much easier. And please remember to provide your version of Postgres. Else we can only assume current version, which is often cause for confusion. — Erwin Brandstetter, Sep 26 '16 at 21:17

score 1 · Answer 1 · edited May 23 '17 at 10:27

1

This is basically a simpler version of your previous question.

SELECT record_id, string_agg(text, ' ') As context
FROM  (
   SELECT *, count(NOT is_response OR NULL) OVER (PARTITION BY record_id ORDER BY id) AS grp
   FROM   conversations
   ORDER  BY record_id, id
   ) sub
GROUP  BY record_id, grp
ORDER  BY record_id, grp;

Produces exactly the desired result, with a single window function in a subquery and then aggregate.

Detailed explanation and links in my answer to your last question:

Select partitions based on matches in other table

edited May 23 '17 at 10:27

Community

1
1

answered Sep 26 '16 at 21:07

Erwin Brandstetter

605,456
145
1,078
1,228

and it does not aggregate more than one false is_response: meaning when there are 2 "in text" or more in a row, it just aggregate the second one – Shlomi Schwartz Sep 27 '16 at 07:21
@ShlomiSchwartz: Neither of these things is in your question. It also does not seem to make sense to aggregate rows that are *not* response to the previous row. – Erwin Brandstetter Sep 27 '16 at 10:33
I was not descriptive enough. However it do make sense to collect those lines, since context is built over time, check out [my question](http://stackoverflow.com/questions/39489933/how-to-break-conversation-data-into-pairs-of-context-response) it could explain my goal clearer. Thanks again for your help! – Shlomi Schwartz Sep 27 '16 at 11:20

score 1 · Accepted Answer · edited May 23 '17 at 12:18

Here's a variation of the answer I gave in your previous question:

SELECT record_id, string_agg(text, ' ')
FROM (
    SELECT *, coalesce(sum(incl::integer) OVER w,0) AS subgrp
    FROM (
        SELECT *, is_response AND NOT coalesce(lead(is_response) OVER w,false) AS incl
        FROM conversations
        WINDOW w AS (PARTITION BY record_id ORDER BY id)
    ) t
    WINDOW w AS (PARTITION BY record_id ORDER BY id ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING)
) t1
GROUP BY record_id, subgrp
HAVING bool_or(incl)
ORDER BY min(id);

The idea is that for each row we look at the next row of the same record with the help of the lead window function. If there is not such row, or if there is one and its is_response is false while the current is_response is true, then we select that row, aggregating all previous unused values of text.

This query also ensures that if the last conversation if incomplete (which doesn't happen in you sample data), it will be omitted.

Aggregate text based on a criteria

2 Answers2