-1

I have a query to show the average length of all the proteins which also exists in a second table mrna_pieces:

SELECT AVG(protein_length)
FROM protein
WHERE exists
(SELECT protein_id FROM mrna_pieces WHERE mrna_brokstukken.protein_id = protein.protein_id)

I also want to show the average for all the proteins which don't exist in the second table.

SELECT AVG(protein_length)
FROM protein
WHERE exists
(SELECT protein_id FROM mrna_pieces WHERE mrna_brokstukken.protein_id != protein.protein_id)

But I want these 2 parts in one table like this table example
I tried:

SELECT AVG(eiwit_lengte) AS avglengthwith, AVG(eiwit_lengte) AS avglengthwithout
FROM eiwit
WHERE exists
(SELECT eiwit_id FROM mrna_brokstukken WHERE mrna_brokstukken.eiwit_id != eiwit.eiwit_id)
WHERE exists
(SELECT eiwit_id FROM mrna_brokstukken WHERE mrna_brokstukken.eiwit_id = eiwit.eiwit_id)

But that gives me the following error:

ERROR: pq: syntax error at or near "WHERE"

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Koenvc
  • 3
  • 2
  • Four answers so far and **all wrong**. I added a fifth. Your second query does not do what you want. You'd need `NOT EXISTS` instead. But we can optimize ... – Erwin Brandstetter Jan 13 '23 at 00:48
  • You need to add a ddl for yourr table with data or a dbfifdle, my answer works fine if your queries have the correct result. Am average without group by returns a scalar value, so please add a [mre] – nbk Jan 13 '23 at 01:24

3 Answers3

1

It should be cheapest to compute the "exists" value once per row in a subquery, then two conditional aggregates in the outer SELECT:

SELECT avg(protein_length) FILTER (WHERE p_exists) AS avg_len_exists
     , avg(protein_length) FILTER (WHERE NOT p_exists) AS avg_len_not_exists
FROM  (
   SELECT protein_length
        , EXISTS (SELECT FROM mrna_pieces m WHERE m.protein_id = p.protein_id) AS p_exists
   FROM   protein p
   ) sub;

About the aggregate FILTER clause (which can be combined with the OVER clause of window functions):

Or, shorter if two result rows are acceptable:

SELECT EXISTS (SELECT FROM mrna_pieces m WHERE m.protein_id = p.protein_id) AS p_exists
     , avg(protein_length) avg_len
FROM   protein p
GROUP  BY 1;

Should be slightly faster, yet.

Either way, if your table mrna_pieces is big, an index with protein_id as leading or only column helps performance (a lot).

Note that the second query in your question is not doing what you are asking for. EXISTS ( ... WHERE m.protein_id != p.protein_id) is true if at least one row in table mrna_pieces does not match, which is always the case if there are at least two distinct values in m.protein_id. You would want NOT EXISTS ( ... WHERE m.protein_id = p.protein_id). But use the cheaper query I suggested instead.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
0

Just select the two at the same time:

select (
  select AVG(protein_length)
  FROM protein
  WHERE exists
  (select protein_id from mrna_pieces where mrna_brokstukken.protein_id = protein.protein_id)
) as avgwith,
(
  select AVG(protein_length)
  FROM protein
  WHERE exists
  (select protein_id from mrna_pieces where mrna_brokstukken.protein_id != protein.protein_id)
)
as avgwithout
politinsa
  • 3,480
  • 1
  • 11
  • 36
  • This copies the incorrect 2nd query from the question, resulting in nonsense for `avgwithout`: the avg len of all rows in table `protein` if at least two distinct values exist in table `mrna_pieces`. – Erwin Brandstetter Jan 13 '23 at 00:46
0

as both queries will gove back a scalar value you can simply use bot as subqueries

SELECT

    (select AVG(protein_length)
    FROM protein
    WHERE exists
    (select protein_id from mrna_pieces where mrna_brokstukken.protein_id = protein.protein_id) ) AVG_1,
    (select AVG(protein_length)
    FROM protein
    WHERE exists
    (select protein_id from mrna_pieces where mrna_brokstukken.protein_id != protein.protein_id)) AVG_2
nbk
  • 45,398
  • 8
  • 30
  • 47