1

I have three queries with results.

Query 1:

SELECT DISTINCT employeeid, work.clientid, ROUND ((CAST (AVG(current_lawn_price) AS numeric) / CAST (AVG((((EXTRACT(HOUR FROM job_finish)*60) + EXTRACT(MIN FROM job_finish))) - ((EXTRACT(HOUR FROM job_start)*60) + EXTRACT(MIN FROM job_start))) AS numeric)) / 1, 2) AS under_over_1
FROM work
JOIN timesheet USING (date_linkid)
JOIN client USING (clientid)
WHERE employeeid = 1 AND workid < 557 AND workid > 188
GROUP BY employeeid, clientid ORDER BY clientid ASC;

employeeid | clientid | under_over_1
------------+----------+--------------
          1 |        3 |         0.54
          1 |        4 |         0.47
          1 |        6 |         0.45
          1 |        7 |         0.59
          . |        . |           .
          . |        . |           .

Query 2:

SELECT DISTINCT employeeid, work.clientid, ROUND ((CAST (AVG(current_lawn_price) AS numeric) / CAST (AVG((((EXTRACT(HOUR FROM job_finish)*60) + EXTRACT(MIN FROM job_finish))) - ((EXTRACT(HOUR FROM job_start)*60) + EXTRACT(MIN FROM job_start))) AS numeric)) / 1.31666666666667, 2) AS under_over_1
FROM work
JOIN timesheet USING (date_linkid)
JOIN client USING (clientid)
WHERE employeeid = 2
GROUP BY employeeid, clientid ORDER BY clientid ASC;

 employeeid | clientid | under_over_1
------------+----------+--------------
          2 |        2 |         1.01
          2 |        3 |         0.21
          2 |        4 |         0.71
          2 |        6 |         0.68
          . |        . |           .
          . |        . |           .

Query: 3

SELECT DISTINCT employeeid, work.clientid, ROUND ((CAST (AVG(current_lawn_price) AS numeric) / CAST (AVG((((EXTRACT(HOUR FROM job_finish)*60) + EXTRACT(MIN FROM job_finish))) - ((EXTRACT(HOUR FROM job_start)*60) + EXTRACT(MIN FROM job_start))) AS numeric)) / 1.31666666666667, 2) AS under_over_1
FROM work
JOIN timesheet USING (date_linkid)
JOIN client USING (clientid)
WHERE employeeid = 3
GROUP BY employeeid, clientid ORDER BY clientid ASC;

 employeeid | clientid | under_over_1
------------+----------+--------------
          3 |        4 |         0.70
          3 |        6 |         0.54
          3 |        7 |         1.03
          3 |       11 |         0.74
          . |        . |           .
          . |        . |           .

I want to output one table with the result of all three queries such as (I am sorry but I have to write more here so I can submit this post. I hope this is enough;-)):

employeeid | clientid | under_over_1
------------+----------+--------------
          1 |        3 |         0.54
          1 |        4 |         0.47
          1 |        6 |         0.45
          1 |        7 |         0.59
          . |        . |           .
          . |        . |           .
          2 |        2 |         1.01
          2 |        3 |         0.21
          2 |        4 |         0.71
          2 |        6 |         0.68
          . |        . |           .
          . |        . |           .
          3 |        4 |         0.70
          3 |        6 |         0.54
          3 |        7 |         1.03
          3 |       11 |         0.74
          . |        . |           .
          . |        . |           .

I tried UNION ALL as in

SELECT DISTINCT employeeid, work.clientid, ROUND ((CAST (AVG(current_lawn_price) AS numeric) / CAST (AVG((((EXTRACT(HOUR FROM job_finish)*60) + EXTRACT(MIN FROM job_finish))) - ((EXTRACT(HOUR FROM job_start)*60) + EXTRACT(MIN FROM job_start))) AS numeric)) / 1, 2) AS under_over_1
FROM work
JOIN timesheet USING (date_linkid)
JOIN client USING (clientid)
WHERE employeeid = 1 AND workid < 557 AND workid > 188
GROUP BY employeeid, clientid ORDER BY clientid ASC

UNION ALL

SELECT DISTINCT employeeid, work.clientid, ROUND ((CAST (AVG(current_lawn_price) AS numeric) / CAST (AVG((((EXTRACT(HOUR FROM job_finish)*60) + EXTRACT(MIN FROM job_finish))) - ((EXTRACT(HOUR FROM job_start)*60) + EXTRACT(MIN FROM job_start))) AS numeric)) / 1.31666666666667, 2) AS under_over_1
FROM work
JOIN timesheet USING (date_linkid)
JOIN client USING (clientid)
WHERE employeeid = 3
GROUP BY employeeid, clientid ORDER BY clientid ASC

UNION ALL

SELECT DISTINCT employeeid, work.clientid, ROUND ((CAST (AVG(current_lawn_price) AS numeric) / CAST (AVG((((EXTRACT(HOUR FROM job_finish)*60) + EXTRACT(MIN FROM job_finish))) - ((EXTRACT(HOUR FROM job_start)*60) + EXTRACT(MIN FROM job_start))) AS numeric)) / 1.31666666666667, 2) AS under_over_1
FROM work
JOIN timesheet USING (date_linkid)
JOIN client USING (clientid)
WHERE employeeid = 2
GROUP BY employeeid, clientid ORDER BY clientid ASC;

However, I get the following error:

ERROR:  syntax error at or near "UNION"
LINE 7: UNION ALL

I am not sure why this is wrong or whether UNION ALL is the right approach here. Anyone have a clue?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Christian Hick
  • 401
  • 3
  • 10

2 Answers2

1

Remove all the ORDER BYs. Then use the resulting query as a derived table. To retain the order you presumably want, you can then use a CASE expression to map the employeeid to and integer determining the order:

SELECT x.employeeid,
       x.clientid,
       x.under_over_1
       FROM (<your UNION ALL query without the ORDER BYs>) x
       ORDER BY CASE x.employeeid
                  WHEN 1 THEN
                    1
                  WHEN 2 THEN
                    3
                  WHEN 3 THEN
                    2
                END,
                x.clientid;
sticky bit
  • 36,626
  • 12
  • 31
  • 42
  • I am afraid the CASE statement didn't work. I get error message: ```ERROR: invalid UNION/INTERSECT/EXCEPT ORDER BY clause LINE 20: GROUP BY employeeid, clientid ORDER BY CASE employeeid ^ DETAIL: Only result column names can be used, not expressions or functions. HINT: Add the expression/function to every SELECT, or move the UNION into a FROM clause. ``` – Christian Hick Dec 15 '19 at 23:26
  • @ChristianHick: Yeah right, sorry. You'd need to use a derived table additionally for such expressions to work. I edited the answer. – sticky bit Dec 15 '19 at 23:40
1

The immediate cause for the error is, quoting the manual:

(ORDER BY and LIMIT can be attached to a subexpression if it is enclosed in parentheses. Without parentheses, these clauses will be taken to apply to the result of the UNION, not to its right-hand input expression.)

Bold emphasis mine.

So, as suggested by @wilx, enclosing each SELECT in parentheses would fix that.

But there is more.

Merge into single query

SELECT employeeid, work.clientid    -- no DISTINCT
     , ROUND ((AVG(current_lawn_price)
             / AVG(extract(epoch FROM job_finish - job_start)))::numeric / 79, 2) AS under_over_1
FROM   work
JOIN   timesheet USING (date_linkid)
JOIN   client    USING (clientid)
WHERE (employeeid IN (1, 2) OR
       employeeid = 3
   AND workid < 557
   AND workid > 188)
GROUP  BY employeeid, clientid 
ORDER  BY CASE employeeid
            WHEN 1 THEN 1
            WHEN 2 THEN 3
            WHEN 3 THEN 2
          END
        , clientid;

Merge the 3 SELECT queries.

Remove the redundant DISTINCT. That's an expensive no-op after GROUP BY.

Instead of extracting hours and minutes from both begin and end timestamps etc., compute the interval by plain subtraction (works with timestamp or time values alike) and extract the epoch from it. Gives you the number of seconds. Divide by 60 and you got the number of minutes much faster. 79 being the result of 60 * 1.31666666666667, accordingly.

The manual about extracting epoch:

For timestamp with time zone values, the number of seconds since 1970-01-01 00:00:00 UTC (can be negative); for date and timestamp values, the number of seconds since 1970-01-01 00:00:00 local time; for interval values, the total number of seconds in the interval

Bold emphasis mine.

Since this removes UNION ALL, the parentheses mentioned at the top are not needed any more.

The CASE expression makes up for the mixed order in employeeid, like sticky bit provided.

If queries cannot be merged

If, for some reason, you can't or won't merge the three original SELECT queries, do this instead:

(  -- parentheses required
SELECT employeeid, work.clientid    -- no DISTINCT !
     , ROUND ((AVG(current_lawn_price)
             / AVG(extract(epoch FROM job_finish - job_start)))::numeric / 79, 2) AS under_over_1
FROM   work
JOIN   timesheet USING (date_linkid)
JOIN   client    USING (clientid)
WHERE  employeeid = 1
AND    workid < 557 
AND    workid > 188
GROUP  BY clientid  -- no need to GROUP BY employeeid while filtering single value
ORDER  BY clientid
)

UNION ALL
(
SELECT employeeid, work.clientid
     , ROUND ((AVG(current_lawn_price)
             / AVG(extract(epoch FROM job_finish - job_start)))::numeric / 79, 2) AS under_over_1
FROM   work
JOIN   timesheet USING (date_linkid)
JOIN   client    USING (clientid)
WHERE  employeeid = 3
GROUP  BY clientid
ORDER  BY clientid
)

UNION ALL
(
SELECT employeeid, work.clientid
     , ROUND ((AVG(current_lawn_price)
             / AVG(extract(epoch FROM job_finish - job_start)))::numeric / 79, 2) AS under_over_1
FROM   work
JOIN   timesheet USING (date_linkid)
JOIN   client    USING (clientid)
WHERE  employeeid = 2
GROUP  BY clientid
ORDER  BY clientid
);
-- no outer ORDER BY required

Keep ORDER BY per SELECT and add parentheses to fix the syntax. UNION ALL (as opposed to UNION) simply appends results preserving the order of individual SELECTs. This should be cheaper than ordering the whole set after UNION ALL. And you did want to keep using queries individual SELECT queries "as is" ...

Most of the other advice above applies accordingly.

Aside: make it a habit to use table aliases and table-qualify all columns in queries joining multiple tables. Much more robust against later changes and easier to read / debug.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Thanks for the detailed explanation. I went with your first select statement. I had to modify it due to the following error: ```ERROR: function round(double precision, integer) does not exist LINE 2: , ROUND (AVG(current_lawn_price)::numeric ^ HINT: No function matches the given name and argument types. You might need to add explicit type casts.``` When I used my original ROUND(.... it worked fine so thanks. – Christian Hick Dec 16 '19 at 01:04
  • @ChristianHick: I fixed the expression by casting the result of the division to `numeric`. If you want to do the division in `numeric` (slower, more precise), cast both averages to `numeric` instead. – Erwin Brandstetter Dec 16 '19 at 01:33
  • I ran the command with your fix. It works without error. I am however not able to use epoch as type is time without time zone (see comment above) so I left it as I had it in my query but changed CAST() AS numeric to ::numeric. Works fine. – Christian Hick Dec 16 '19 at 20:44
  • @ChristianHick: `::numeric` is just shorthand syntax, see: https://stackoverflow.com/a/13676871/939860. And my expression should just work for data type `time`. See: *db<>fiddle [here](https://dbfiddle.uk/?rdbms=postgres_12&fiddle=1ed03f8bbc344e1631d7b4f6de19e11e)*. Substantially simpler and faster. Just sayin. – Erwin Brandstetter Dec 18 '19 at 03:32
  • If I run your query (the first one) as is, all rows for under_over_1 = 0.00. So epoch does not work. job_finish and job_start are not timestamps, they are hours and minute columns, e.g.: 00:23:00, 01:11:00, 25:12:00 etc. There are not seconds for epoch to draw on. – Christian Hick Dec 18 '19 at 17:23
  • You said `type is time without time zone`. But `25:12:00` is not a valid `time` Would work as `interval`. Please remember to disclose your actual table definition (`CREATE TABLE` statement) next time. – Erwin Brandstetter Dec 18 '19 at 17:40