Compute averages for two distinct sets from one table in a single query

Question

I have a query to show the average length of all the proteins which also exists in a second table mrna_pieces:

SELECT AVG(protein_length)
FROM protein
WHERE exists
(SELECT protein_id FROM mrna_pieces WHERE mrna_brokstukken.protein_id = protein.protein_id)

I also want to show the average for all the proteins which don't exist in the second table.

SELECT AVG(protein_length)
FROM protein
WHERE exists
(SELECT protein_id FROM mrna_pieces WHERE mrna_brokstukken.protein_id != protein.protein_id)

But I want these 2 parts in one table like this table example
I tried:

SELECT AVG(eiwit_lengte) AS avglengthwith, AVG(eiwit_lengte) AS avglengthwithout
FROM eiwit
WHERE exists
(SELECT eiwit_id FROM mrna_brokstukken WHERE mrna_brokstukken.eiwit_id != eiwit.eiwit_id)
WHERE exists
(SELECT eiwit_id FROM mrna_brokstukken WHERE mrna_brokstukken.eiwit_id = eiwit.eiwit_id)

But that gives me the following error:

ERROR: pq: syntax error at or near "WHERE"

Four answers so far and **all wrong**. I added a fifth. Your second query does not do what you want. You'd need `NOT EXISTS` instead. But we can optimize ... — Erwin Brandstetter, Jan 13 '23 at 00:48
You need to add a ddl for yourr table with data or a dbfifdle, my answer works fine if your queries have the correct result. Am average without group by returns a scalar value, so please add a [mre] — nbk, Jan 13 '23 at 01:24

Erwin Brandstetter · Accepted Answer · 2023-01-14T17:37:06.340

It should be cheapest to compute the "exists" value once per row in a subquery, then two conditional aggregates in the outer SELECT:

SELECT avg(protein_length) FILTER (WHERE p_exists) AS avg_len_exists
     , avg(protein_length) FILTER (WHERE NOT p_exists) AS avg_len_not_exists
FROM  (
   SELECT protein_length
        , EXISTS (SELECT FROM mrna_pieces m WHERE m.protein_id = p.protein_id) AS p_exists
   FROM   protein p
   ) sub;

About the aggregate FILTER clause (which can be combined with the OVER clause of window functions):

Aggregate columns with additional (distinct) filters

Or, shorter if two result rows are acceptable:

SELECT EXISTS (SELECT FROM mrna_pieces m WHERE m.protein_id = p.protein_id) AS p_exists
     , avg(protein_length) avg_len
FROM   protein p
GROUP  BY 1;

Should be slightly faster, yet.

Either way, if your table mrna_pieces is big, an index with protein_id as leading or only column helps performance (a lot).

Note that the second query in your question is not doing what you are asking for. EXISTS ( ... WHERE m.protein_id != p.protein_id) is true if at least one row in table mrna_pieces does not match, which is always the case if there are at least two distinct values in m.protein_id. You would want NOT EXISTS ( ... WHERE m.protein_id = p.protein_id). But use the cheaper query I suggested instead.

score 0 · Answer 2 · answered Jan 12 '23 at 20:39

0

Just select the two at the same time:

select (
  select AVG(protein_length)
  FROM protein
  WHERE exists
  (select protein_id from mrna_pieces where mrna_brokstukken.protein_id = protein.protein_id)
) as avgwith,
(
  select AVG(protein_length)
  FROM protein
  WHERE exists
  (select protein_id from mrna_pieces where mrna_brokstukken.protein_id != protein.protein_id)
)
as avgwithout

answered Jan 12 '23 at 20:39

politinsa

3,480
1
11
36

This copies the incorrect 2nd query from the question, resulting in nonsense for `avgwithout`: the avg len of all rows in table `protein` if at least two distinct values exist in table `mrna_pieces`. – Erwin Brandstetter Jan 13 '23 at 00:46

score 0 · Answer 3 · answered Jan 12 '23 at 20:40

0

as both queries will gove back a scalar value you can simply use bot as subqueries

SELECT

    (select AVG(protein_length)
    FROM protein
    WHERE exists
    (select protein_id from mrna_pieces where mrna_brokstukken.protein_id = protein.protein_id) ) AVG_1,
    (select AVG(protein_length)
    FROM protein
    WHERE exists
    (select protein_id from mrna_pieces where mrna_brokstukken.protein_id != protein.protein_id)) AVG_2

answered Jan 12 '23 at 20:40

nbk

45,398
8
30
47

Same problem as https://stackoverflow.com/questions/75102035/join-2-sql-avg-table-in-one-query/75103681#comment132535146_75102073 – Erwin Brandstetter Jan 13 '23 at 00:47

Compute averages for two distinct sets from one table in a single query

3 Answers3