0

Following tables:

CREATE TEMPORARY TABLE guys ( guy_id integer primary key, guy text );
CREATE TEMPORARY TABLE sales ( log_date date, sales_guys integer[], sales smallint );
INSERT INTO guys VALUES(1,'john'),(2,'joe');
INSERT INTO sales VALUES('2016-01-01', '{1,2}', 2),('2016-01-02','{1,2}',4);

Following query works great to show names on a given date:

SELECT log_date, sales_guys, ARRAY_AGG(guy), sales 
FROM sales 
JOIN guys ON 
   guys.guy_id = ANY(sales.sales_guys) 
GROUP BY log_date, sales_guys, sales 
ORDER BY log_date ASC;

  log_date  | sales_guys | array_agg  | sales 
------------+------------+------------+-------
 2016-01-01 | {1,2}      | {john,joe} |     2
 2016-01-02 | {1,2}      | {john,joe} |     4

Following query problematically gives me a name per date per guy, so here each name twice, and so on):

SELECT sales_guys, ARRAY_AGG(guy), SUM(sales) AS sales
FROM sales
JOIN guys ON guys.guy_id = ANY(sales.sales_guys)
GROUP BY sales_guys;

Yields:

 sales_guys |      array_agg      | sales 
------------+---------------------+-------
 {1,2}      | {john,joe,john,joe} |    12

Is there a way to somehow reduce the ARRAY_AGG call to give only the unique names?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Wells
  • 10,415
  • 14
  • 55
  • 85
  • Please clarify what you want to aggregate *exactly*. Sales for the same team? Would you group `{1,2}` with `{2,1}`? And always declare your version of Postgres. – Erwin Brandstetter Nov 25 '16 at 20:03

2 Answers2

2

There is no kind of order you can trust without ORDER BY. Except that elements of arrays, when unnested, come in array order. If your query does more with the result, it may be re-ordered, though.

You an simply add ORDER BY to any aggregate function in Postgres:

SELECT s.sales_guys, ARRAY_AGG(DISTINCT g.guy ORDER BY g.guy) AS names, SUM(s.sales) AS sum_sales
FROM   sales s
JOIN   guys  g ON g.guy_id = ANY(s.sales_guys)
GROUP  BY s.sales_guys;

But that's obviously not the original order of array elements. And the query has other issues ... Neither IN nor = ANY() care about order of elements in the set, list or array on the right side:

Proper solution

For this task (attention to the details!):

Get the total sales per array sales_guys, where the order of elements makes a difference (arrays '{1,2}' and '{2,1}' are not the same) and sales_guys has neither duplicate nor NULL elements. Add an array of resolved names in matching order.

Use unnest() with WITH ORDINALITY. and aggregate arrays before you resolve names, that's cheaper and less error prone.

SELECT s.*, g.
FROM  (
   SELECT sales_guys, sum (sales) AS total_sales                -- aggregate first in subquery
   FROM   sales
   GROUP  BY 1
   ) s
, LATERAL (
   SELECT array_agg(guy ORDER BY ord) AS names                  -- order by original order
   FROM   unnest(s.sales_guys) WITH ORDINALITY sg(guy_id, ord)  -- with order of elements
   LEFT   JOIN guys g USING (guy_id)                            -- LEFT JOIN to add NULL for missing guy_id
   ) g;

The LATERAL subquery can be joined with unconditional CROSS JOIN - comma (,) is shorthand notation - because the aggregate in the subquery guarantees a result for every row. Else you'd use LEFT JOIN LATERAL .. ON true.

Detailed explanation:

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
0

You can use DISTINCT inside of the aggregate:

SELECT sales_guys, ARRAY_AGG(DISTINCT guy), SUM(sales) AS sales FROM sales JOIN guys ON guys.guy_id = ANY(sales.sales_guys) GROUP BY sales_guys;
clemens
  • 16,716
  • 11
  • 50
  • 65
  • dang, that's obvious. Woof. Is there some kind of order you can trust in the DISTINCT call? Would they be ordered here by `guy_id` in `guys` or `guy` in `guys` or is it effectively random? – Wells Nov 25 '16 at 18:16
  • I think there is no defined order. You can fetch the _guys_ in the join by an ordered subquery or view first to get an ordered array. – clemens Nov 25 '16 at 18:27
  • hmm, I kind of hoped to get a guy_id array, and a guy array such that I could trust guy_id[1] == guy[1] but it doesn't seem possible w/o a subquery. – Wells Nov 25 '16 at 18:36