Select multiple columns from a table but group by one which is always unique

Question

In the UI I have a table where I'm showing data grouped by shipday which is days of the week. To do that in the hasura I've created a PostgreSQL view -

CREATE
OR REPLACE VIEW "public"."view_shipday" AS
SELECT
  shipdata.shipday,
  count(*) AS count,
  sum(shipdata.delivered) AS delivered,
  sum(shipdata.transit) AS transit,
  sum(shipdata.refused) AS refused,
  sum(shipdata.undeliverable) AS undeliverable,
  sum((shipdata.undeliverable + shipdata.refused)) AS total_exceptions
FROM
  shipdata
GROUP BY
  shipdata.shipday;

Now on the UI, I've two filters carrier and shipdate when the user selects filter like shipdate or carrier or both I want to group the data by selected filter with shipday but I want shipday to be always unique(this is important). I've tried creating below view but this creates duplicate shipday .

CREATE
OR REPLACE VIEW "public"."view_shipday_and_filter" AS
SELECT
  shipdata.shipday,
  date(shipdata.shipdate),
  shipdata.carrier,
  count(*) AS count,
  sum(shipdata.delivered) AS delivered,
  sum(shipdata.transit) AS transit,
  sum(shipdata.refused) AS refused,
  sum(shipdata.undeliverable) AS undeliverable,
  sum((shipdata.undeliverable + shipdata.refused)) AS total_exceptions
FROM
  shipdata
GROUP BY
  shipdata.shipday,
  date(shipdata.shipdate),
  shipdata.carrier;

AFAIK hasura doesn't allow to do group by with their graphql queries. I don't want to do the grouping in the client side since the data size is really big and it will slow down the app. So in the database I want to create one or multiple(if needed) views that will group the data handling above mentioned cases so that shipday always remains unique. If there is other option to achieve this without creating view I'm open for that too.

PS I'm using hasura graphql queries. I don't want to run separate SQL statement in the client side to fetch data.

`Group by shipday` you get one row per `shipday`, `group by shipday, shipdate` you get one row per `shipday` and `shipdate`, , `group by shipday, shipdate,carrier` you get one row per `shipday` and `shipdate` and `carrier`. Does this answer your question? — Marmite Bomber, Mar 01 '23 at 08:29
No. I want `shipday` to be unique. If I `group by shipday, shipdate` it will create multiple same `shipday` and same applies for `group by shipday, shipdate, carrier`. — mused, Mar 01 '23 at 08:34
For a given shipdate & carrier, which shipday do you want you to keep? What is stopping you from asking for that one? PS Please clarify via edits, not comment. PS Debug questions require a [mre]. That includes a clear specification. [mre] [ask] [Help] PS Basic questions are faqs. And research is expected. — philipxy, Mar 01 '23 at 11:00
Does this answer your question? [Fetch the rows which have the Max value for a column for each distinct value of another column](https://stackoverflow.com/questions/121387/fetch-the-rows-which-have-the-max-value-for-a-column-for-each-distinct-value-of) — philipxy, Mar 01 '23 at 11:00
^^I want a `shipday` with the correct aggregation of the corresponding columns. To rephrase my problem. with the 1st view in the question I can't apply `shipdate` and `carrier` filter because they are not available in the view that's why I've created the 2nd view in the question to apply `shipdate` and `carrier` filter. Now the problem with 2nd view is `shipday` column is becoming duplicate but I want `shipday` to be unique. ^No it's a separate problem. — mused, Mar 01 '23 at 17:32
"Please clarify via edits, not comments." Although what you put in yoru comments isn't clear. Use enough words, sentences & references to parts of examples to clearly & fully say what you mean. "[mre]" PS See [How do comment replies work?](https://meta.stackexchange.com/q/43019/266284) to learn to use `@x` to notify 1 non-sole non-poster commenter `x` per comment about that comment. Posters, sole commenters & followers of posts always get notified. Without `@` other commenters get no notification. — philipxy, Mar 01 '23 at 21:48

score 0 · Answer 1 · answered Mar 01 '23 at 07:17

0

There is no performance difference between querying a view and running the query that defines the view. In fact, PostgreSQL replaces the view with its definition when it runs the query.

You could use the view at the end of your question and add an additional GROUP BY and summation to the query that uses the view, but that would make the GROUP BY in the view pointless and will actually make the query slower.

My opinion is that you have nothing to gain by using a view in this case. Since the filter condition needs to be applied before grouping, your only option is to directly query shipdata and do the grouping and summation in your application query.

You mention nothing about that, but if part of your problem is that the performance of this query is too slow, you have to use a different tool than a view. In that case, an option is to use a materialized view defined using the "partially aggregated" query and do the final filtering and grouping in your application query. The pre-aggregated materialized view will be smaller than the base table, so that query will perform better. The downside is that the materialized view doesn't change if the data in shipdata do, so you have to REFRESH it regularly and live with slightly stale data in between.

answered Mar 01 '23 at 07:17

Laurenz Albe

209,280
17
206
263

Yes, fetching data directly from `shipdata` is a performance issue because I've to do the grouping every time a filter is chosen that's why I've chosen view. What you are saying to do with `materialized view` I think it's possible to do the same with the 2nd view in the question. But I want all these grouping and aggregation to happen in the database because I've to do the same thing with other columns from `shipdata` table. If there is no other option I'll do the last step of grouping in the client side. – mused Mar 01 '23 at 08:15
The query always happens on the database side, never in the client. Don't worry. – Laurenz Albe Mar 01 '23 at 08:25
Another point, since I'm using hasura, they don't allow to do the grouping with their graphql query. – mused Mar 01 '23 at 08:25
Tools like that always fetter you to some extent. If you cannot make it within the limits of Hasura, write an SQL statement yourself. After all, it is most important that the application performs well. But I cannot imagine that Hasura doesn't allow you to use `GROUP BY`. If that were the case, the tool would be nigh unusable. – Laurenz Albe Mar 01 '23 at 09:35

score 0 · Answer 2 · answered Mar 01 '23 at 10:28

To get an unfiltered shipday report you must use this query (see sample data below)

select shipday, count(*)
from shipday
group by 1 order by 1;

shipday|count|
-------+-----+
      1|    2|
      2|    1|

To get a report with a filter say for carrier = 'a' you must add a where predicate but still group by shipday

select shipday, count(*)
from shipday
where carrier = 'a'
group by 1 order by 1;
shipday|count|
-------+-----+
      1|    1|
      2|    1|

There is no possibility AFAIK to get those two queries in a view in PostgreSQL. You will need a feature called parametrised view provided by some other RDBMS.

Anyway you may use a function to elegantly cover this feature.

create or replace function select_shipday(p_carrier varchar(10) default null)
returns table (shipday int, cnt int) language sql as $$
    select shipday, count(*) cnt
    from shipday
    where carrier = p_carrier or p_carrier is null
   group by 1;
$$;

The function gets a parameter to filter the carrier (if defined) the default is to get all carriers.

Sample calls

-- all carriers
select * from select_shipday();
-- only carrier a
select * from select_shipday('a');

This is of course the simplest example, but hopefully you get the idea how to add other paramaters or other function for more advanced filters.

Also you may want to add an if then logic in the function and define dedicated queries for different kind of filters.

Sample data

create table shipday as
select * from (values 
(1, 'a'),
(1, 'b'),
(2, 'a')
) tab (shipday, carrier)

So far my understanding is running a function using 2nd view in my question as base table might solve my problem. Can I apply same aggregation inside the function as I did in my view? — mused, Mar 01 '23 at 17:44
You may of course use a *pre-aggregated* view (containing all the columns used for a filter) instead of the base table. But as pointed out in the other answer this will have no effect. A performance boost will bring only a *materialized view*. If *your problem* is to get a *simple query* without `group by` making the filter and the aggregation, then yes the above function is the solution (and you may accept the answer @Sakin ;) — Marmite Bomber, Mar 02 '23 at 07:38

Select multiple columns from a table but group by one which is always unique

2 Answers2