2

I've inherited the need to convert a production MySQL DB over to Postgres. This has mostly been handled without issue using simple SQL statements for table/function creation (using Navicat to generate semi-automated conversion), but now I am hitting a problem with converting a somewhat complex view.

Research suggests this might be due to differences in how the two DBs handle sub-queries (the WHERE statements), and perhaps it's just a syntax difference. The business logic here is unknown as the code-base has been inherited from another developer.

Running the following (using Laravel migrations / PHP script):

SELECT 
parent.is_owner AS is_owner,
parent.brand AS first_name,
parent.id AS id,
(SELECT count(c.id)
 FROM campaigns c
 WHERE((
       (c.user_id = parent.id)
       OR
       (c.user_id = child.id)
       )
       AND
       (c.campaign_status_id = 4)
))
AS current_campaigns,
(SELECT count(c.id)
    FROM campaigns c
    WHERE
        ((
        (c.user_id = parent.id)
        OR (c.user_id = child.id)
        )
        AND (c.campaign_status_id = 5)
))
AS past_campaigns,
(SELECT count(c.id)
    FROM campaigns c
    WHERE
        ((
         (c.user_id = parent.id)
         OR (c.user_id = child.id))
         AND (c.campaign_status_id = 2)
        ))
    AS pending_campaigns,
(SELECT count(c.id)
    FROM    campaigns c
    WHERE ((
            (c.user_id = parent.id)
            OR (c.user_id = child.id)
            )
            AND (c.invoice_status = '1')
        ))
    AS past_invoices
FROM ((users parent LEFT JOIN campaigns mc ON
     ((parent.id = mc.user_id)))
    LEFT JOIN users child ON ((child.parent_owner = parent.id)
    ))
WHERE
(
    (parent.is_owner = 1)
    OR (child.is_retailer = 1)
)
GROUP BY parent.id
ORDER BY parent.brand

... triggers the error

SQLSTATE[42803]: Grouping error: 7 ERROR:  subquery uses ungrouped column "child.id" from outer query
  LINE 1: ...c where (((c.user_id = parent.id) or (c.user_id = child.id)) ...

Can anyone suggest how to format this so Postgres runs the sub-queries?

BTW, the PHP code used here in a Laravel migration script is:

...

DB::unprepared("CREATE VIEW client AS
select parent.is_owner AS is_owner,parent.brand AS first_name,parent.id AS id
   ,(select count(c.id) from campaigns c where (((c.user_id = parent.id) or (c.user_id = child.id)) and (c.campaign_status_id = 4))) AS current_campaigns
   ,(select count(c.id) from campaigns c where (((c.user_id = parent.id) or (c.user_id = child.id)) and (c.campaign_status_id = 5))) AS past_campaigns
   ,(select count(c.id) from campaigns c where (((c.user_id = parent.id) or (c.user_id = child.id)) and (c.campaign_status_id = 2))) AS pending_campaigns
   ,(select count(c.id) from campaigns c where (((c.user_id = parent.id) or (c.user_id = child.id)) and (c.invoice_status = '1'))) AS past_invoices
from ((users parent
left join campaigns mc on((parent.id = mc.user_id)))
left join users child on((child.parent_owner = parent.id)))
where ((parent.is_owner = 1) or (child.is_retailer = 1))
group by parent.id
order by parent.brand;");

UPDATE, FIXED:

Brilliant. Very good input here from all.

The solutions from @patrick and @ErwinBrandstetter both work. I will favour Patrick's here as my role in this is to convert the system "as-is". There may be scope to refactor in the future, but at this stage I feel it risky to mess with (or improve) someone else's duct tape solution (i.e the code-base seems overly complex in places, with no sign of documentation, and I'm reluctant to poke around or attempt core improvement without more background info on business logic). I suspect parts of the model may need to be overhauled anyway, so [sic]-fix favoured here.

I suspected some click-jiggery might have generated the original query... trying to give the original dev the benefit of the doubt and assume there was some business pressure that called for a quick (i.e mousy) turn-around. Complex SQL is not my strong suit but I'm glad my instinct was correct, the query being unnecessary complex to begin with. Perhaps the view was an unplanned bolt-on -not designed in the first place. Wise or not, I'd probably have tried to hit the problem with an ORM based approach.

I'm on this project last minute, running cleanup for a re-launch (original dev was "let go"), so am working with a mostly undocumented code-base full of unknown functionality. Running paratrooper as it were. Thankfully, this view issue appears to the last piece of the puzzle. Thank you :-)

Daniel
  • 685
  • 1
  • 6
  • 13
  • 1
    The problem here is that unlike, MySQL, PostgreSQL actually adheres to the SQL standard. Which means that if you do a group by, all columns need to be in the group by or need a aggregate (min/max/avg/etc.) function. If you can add the table definitions and a tiny bit of example data to an sqlfiddle I can easily create the query for you :) – Wolph Jul 18 '15 at 08:34
  • I don't see your version of Postgres anywhere? It's essential to determine the best answer. Also, the error is due to an ambiguity in the query and you did not provide information which way to resolve it. Basically: Do you want to count campaigns that are linked to multiple children multiple times or just once? – Erwin Brandstetter Jul 18 '15 at 13:44
  • Good luck with your re-launch and keep your fingers crossed. +1 for a well-posed question and follow-up. – Patrick Jul 20 '15 at 00:49

2 Answers2

6

Oh my, oh my. The developer had a tic in his right ring finger, no doubt, because the statement had no less than 74 parentheses. Here is how you can get by with just 8 parentheses and 14 lines instead of 54:

SELECT 
  parent.is_owner AS is_owner,
  parent.brand AS first_name,
  parent.id AS id,
  sum(CASE WHEN c.campaign_status_id = 4 THEN 1 ElSE 0 END) AS current_campaigns,
  sum(CASE WHEN c.campaign_status_id = 5 THEN 1 ElSE 0 END) AS past_campaigns,
  sum(CASE WHEN c.campaign_status_id = 2 THEN 1 ElSE 0 END) AS pending_campaigns,
  sum(CASE WHEN c.invoice_status = '1' THEN 1 ElSE 0 END) AS past_invoices,
FROM users parent
LEFT JOIN users child ON child.parent_owner = parent.id
LEFT JOIN campaigns c ON c.user_id = parent.id OR c.user_id = child.id
WHERE parent.is_owner = 1 OR child.is_retailer = 1
GROUP BY parent.is_owner, parent.brand, parent.id
ORDER BY parent.brand;

No sub-selects means this code will much run faster to boot. Like Wolph mentioned in his comment, every column in the select list not included in an aggregate function must appear in the GROUP BY clause, as specified by the SQL standard, but blissfully ignored by MySQL.

The sub-selects are avoided by using the CASE construct: conditional expression evaluation in the column list. Note that the repetitive clause of the filtering in the sub-selects is now performed as the JOIN clause, only the one relevant column in campaigns is evaluated per column in the main query. Issuing 1 or 0 from the CASE statement and wrapping that in a sum() function is a nifty trick to do multiple, distinct counts in a single query.

As Wolph noted in his comment below this answer, the clause

sum(CASE WHEN c.campaign_status_id = 4 THEN 1 ElSE 0 END) AS current_campaigns

can also be written more succinctly as

sum((c.campaign_status_id = 4)::integer) AS current_campaigns

This may be somewhat faster than the CASE statement given that the boolean to integer cast in the C language in which PostgreSQL is written does not require any operation (a boolean is either 1 or 0 in C). Legibility is certainly less (not to mention the use of twice as many parentheses!).

Patrick
  • 29,357
  • 6
  • 62
  • 90
  • 1
    Excellent answer, +1 from me :) I'm not sure about the counts though, shouldn't that be `SUM()` in this case? Also, this might be easier than a case statement: `(c.campaign_status_id = 4)::integer` – Wolph Jul 18 '15 at 09:28
  • @Wolph Well-spotted on the `sum()` issue (those pesky errors in converting existing code!), answer updated and +1 on your comment, for what it is worth. The cast is less useful, or so it appears, as `CASE` wants a boolean expression. – Patrick Jul 18 '15 at 09:45
  • 1
    I meant that you can drop the case entirely when using the cast. Casting the boolean to integer will give 0 or 1 depending on true or false – Wolph Jul 18 '15 at 09:47
  • @Wolph Oh, that is nifty indeed. Almost below the belt, in terms of legibility anyway. Any insights on performance impact? I'll paste it in my answer. – Patrick Jul 18 '15 at 10:46
  • In my experience Postgres can optimise both fairly well. It's just a personal preference for me since I find case statements too verbose to be readable – Wolph Jul 18 '15 at 12:47
  • This query is probably wrong (like the original): If a user has several children, all campaigns the parent user "owns" would be counted several times, because the join condition ` c.user_id = parent.id OR c.user_id = child.id` is TRUE every time. It would work is parent users never own campaigns directly (which has not been specified). – Erwin Brandstetter Jul 18 '15 at 16:18
  • @ErwinBrandstetter I would agree with you if not for "production MySQL DB" and "the business logic here is unknown" in the question. So I opted for a sanitization of the code provided, rather than offer insights on the business logic. Another obvious design issue: why create the view in code, instead of in the DBMS? And why is the business logic not properly documented? – Patrick Jul 19 '15 at 03:47
  • @patrick - Laravel comes with a DB migrations system which allows for highly granular changes via a simple SSH command. Updates are deployed / rolled back (i.e effectively a kind of DB version control) via classes (one class per table or change...). This works very nicely with automated deployments and reduces a lot of DB admin. It's also very easy for devs to follow how the data structure has evolved (it's a lot cleaner than pure SQL scripts). I find it especially useful for quickly creating/seeding test environments automatically. http://laravel.com/docs/5.0/migrations – Daniel Jul 19 '15 at 16:04
  • @patrick - re business logic: inherited snafu. Very little documentation throughout. So it goes... your answer was right on target with regard to business and code-smell constraints. – Daniel Jul 19 '15 at 16:08
  • Thanks all for the insight. See update to question above. – Daniel Jul 19 '15 at 16:40
2

Explanation is missing in the question, but the probable use case is:

Count how many campaigns each user "owns". A user can have child users, the campaigns of which shall be added to the parent user.

Apart from the incredibly noisy syntax that @Patrick decluttered in his demo, the query is also ambiguous (and probably wrong altogether):

If we can assume:

  • Referential integrity: child users only reference existing parent user, enforced with a FOREIGN KEY constraint.

  • Parents and children are marked is_owner / is_retailer reliably, those columns only hold values 0 and 1. See below.

This query would do the job:

SELECT CASE WHEN u.is_retailer = 1 THEN u.parent_owner
            WHEN u.is_owner = 1    THEN u.id END        AS user_id
     , max(u.is_owner)                                  AS is_owner
     , max(u.brand) FILTER (WHERE u.is_owner = 1)       AS first_name
     , count(*) FILTER (WHERE c.campaign_status_id = 4) AS current_campaigns
     , count(*) FILTER (WHERE c.campaign_status_id = 5) AS past_campaigns
     , count(*) FILTER (WHERE c.campaign_status_id = 2) AS pending_campaigns
     , count(*) FILTER (WHERE c.invoice_status = '1')   AS past_invoices
FROM   users          u
LEFT   JOIN campaigns c ON u.id = c.user_id
                       AND (c.campaign_status_id IN (4, 5, 2) OR 
                            c.invoice_status = '1')  -- exclude irrelevant early
WHERE  1 IN (u.is_owner, u.is_retailer)  -- parent & child, may be redundant
GROUP  BY 1
ORDER  BY 2;

Should be pretty fast. Be sure to have fitting indexes for big tables.
This condition is redundant if there are no other options:

   WHERE  1 IN (u.is_owner, u.is_retailer)

I worked with your data model "as is", but you probably should just have boolean columns:

  • is_child: true for children, false for parents.
  • is_owner: true for owner, false for retailer.

Using the new aggregate FILTER clause introduced in Postgres 9.4:

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Thanks @ErwinBrandstetter. This works also. I may pursue this in the future but for now (due to current project scope) am running with [sic]-fix from Patrick. See update to original question. – Daniel Jul 19 '15 at 16:33