1

I am having difficulty converting a view from T-SQL to Postgresql as it relates to the aggregate functions.

Here is my original SQL query:

SELECT TOP (100) PERCENT 
    thread_id, 
    MIN(message_id) AS message_id,  
    MIN(parent_message_id) AS parent_message_id, 
    MIN(created_at) AS initialResponse
FROM 
    dbo.bi_linear_thread
WHERE 
    LEFT([subject], 5) LIKE '%RE:%' AND parent_message_id IS NOT NULL
GROUP BY 
    thread_id
ORDER BY 
    thread_id

I have attempted to utilize the window function of:

first_value(message_id) OVER (Partition BY message_id ORDER BY messageid)

but continue to get incorrect returns.

Thoughts?

Edit for more context

-- after the answer provided. Hopeful this is helpful to others.

Needed to read the first line in a group that excluded the first record in a hierarchal order of threads to replies.
thread_id represents the thread.
message_id represents any message type whether a reply/original message. parent_message_id represents the original message in the thread.
"RE:" is a specified format for replies regardless whether nested or not in the subject field.

Community
  • 1
  • 1

2 Answers2

2
SELECT thread_id
      ,MIN(message_id) AS message_id
      ,MIN(parent_message_id) AS parent_message_id
      ,MIN(created_at) AS initialResponse
FROM   dbo.bi_linear_thread
WHERE  left(subject, 5) LIKE '%RE:%'
AND    parent_message_id IS NOT NULL
GROUP  BY thread_id
ORDER  BY thread_id;

Except for removing the illegal brackets from [subject] and also removing TOP (100) PERCENT, which is just noise anyway, the query should work.

You might want ILIKE instead of LIKE for case-insensitive pattern-matching.

left() was introduced with Postgres 9.1.

If you want CaMeL-case identifiers you need to double-quote: "initialResponse". Else they are lower-cased automatically. My advice is to use lower-case identifiers exclusively.

Educated guess

If you actually want the row with the earliest created_at per thread_id, which you cunningly kept a secret, your query is incorrect, tSQL or PostgreSQL alike.
You could use DISTINCT ON for that.

You probably don't have a schema named dbo in Postgres either, and that's an artifact from SQL Server, so I removed that, too.

I use ILIKE as explained above.

SELECT DISTINCT ON (thread_id)
       thread_id, message_id, parent_message_id, created_at AS initial_response
FROM   bi_linear_thread
WHERE  left(subject, 5) ILIKE '%RE:%'
AND    parent_message_id IS NOT NULL
ORDER  BY thread_id, created_at;

Information on DISTINCT ON:
Select first row in each GROUP BY group?

For perfomance, a trigram GIN index on the expression left(subject, 5) might help quite a bit. More info:
PostgreSQL LIKE query performance variations

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Thanks, that cleans up the dirty TSQL... I'm going from T-SQL to Postgresql 8.2.x errr whatever... – standardcandle- Jul 26 '13 at 22:55
  • 1
    @standardcandle-: **8.2**?? Really?? That is extremely outdated and has been unsupported for some while now. You should really go for 9.2 instead. Do not use 8.2 –  Jul 26 '13 at 23:08
  • @standardcandle-: What a_horse said. [Consider this](http://www.postgresql.org/support/versioning/). Maybe just a typo and you meant 9.2? `left()` does not exist in 8.2 for starters. – Erwin Brandstetter Jul 26 '13 at 23:10
  • Thanks for your help... you were 100% right... I had ~3% error on records on the way I was trying to do it. I was unaware of the DISTINCT ON As for the version... I'm using an outdated version of GreenPlum, and waiting on IT to upgrade the environment. Thanks for the insight! – standardcandle- Jul 29 '13 at 22:42
0

This was the closest I came on my own... better answer above.

SELECT DISTINCT 
 thread_id
,first_value(message_id) OVER (Partition BY thread_id ORDER BY message_id) AS message_id
,first_value(parent_message_id) OVER (Partition BY thread_id ORDER BY parent_message_id) AS parent_message_id
,first_value(created_at) OVER (Partition BY thread_id ORDER BY created_at) AS initialResponse 
FROM bi_linear_thread
WHERE parent_message_id IS NOT NULL AND subject ~* '.*RE:.*'
GROUP BY message_id, thread_id, parent_message_id, created_at
ORDER BY thread_id desc