2

I am using this query to find the unique records by latest date using postgresql. The error I am having is "aggregate functions are not allowed in WHERE". How to fix error “aggregate functions are not allowed in WHERE” Following this link I have tried to use inner select function. But this did not work. Please help me to edit the query. I am using PgAdmin III as client.

SELECT Distinct t1.pa_serial_
    ,t1.homeownerm_name
    ,t1.districtvdc
    ,t1.date as firstrancheinspection_date
    ,t1.status
    ,t1.name_of_data_collector
    ,t1.fulcrum_id
    ,first_tranche_inspection_v2_reporting_questionnaire.date_reporting
From first_tranche_inspection_v2 t1
LEFT JOIN first_tranche_inspection_v2_reporting_questionnaire ON (t1.fulcrum_id = first_tranche_inspection_v2_reporting_questionnaire.fulcrum_parent_id)
where first_tranche_inspection_v2_reporting_questionnaire.date_reporting = (
        select Max(first_tranche_inspection_v2_reporting_questionnaire.date_reporting)
        from first_tranche_inspection_v2
        where first_tranche_inspection_v2.pa_serial_ = t1.pa_serial_
        );
Community
  • 1
  • 1
khushbu
  • 567
  • 2
  • 8
  • 24
  • 2
    In your subquery you select from `first_tranche_inspection_v2` but the MAX function relates to a field in `first_tranche_inspection_v2_reporting_questionnaire`. Is this a typo? Or what do you want to achieve there? – Thorsten Kettner Jan 31 '17 at 09:52
  • Yes this is my requirement so I have joined these two tables – khushbu Jan 31 '17 at 10:36
  • 1
    No, I mean MAX is meant to aggregate the table in question, so consequently the field you use MAX on would have to be a field of that table. As it isn't, you are not aggregating the table in the subquery, but simply refer to a maximum that you would have to determine outside of it. But outside of the subquery there is no aggregation. In short: you are doing something very wrong here. The question is: what is it you want to do? – Thorsten Kettner Jan 31 '17 at 10:59
  • what I am trying to do is -The info like pa_serial_, homeownerm_name, status..etc are obtained from first_tranche_inspection_v2 but other few columns are obtained from first_tranche_inspection_v2_reporting_questionnaire and I want to obtain the latest records based on date_reporting from first_tranche_inspection_v2_reporting_questionnaire table as this table contains same information on different dates and I need to obtain unique latest records – khushbu Jan 31 '17 at 11:09
  • 1
    _Show_ us by way of sample data. – Tim Biegeleisen Jan 31 '17 at 11:15
  • You are using the wrong table name in your subquery. (You probably got confused by the very long names, or your keyboard wore out) – joop Jan 31 '17 at 12:40

2 Answers2

3

You want to join the latest reporting questionaire per inspection. In PostgreSQL you can use DISTINCT ON for this:

select fti.*, rq.* 
from first_tranche_inspection_v2 fti
left join
(
  select distinct on (fulcrum_parent_id) *
  from first_tranche_inspection_v2_reporting_questionnaire
  order by fulcrum_parent_id, date_reporting desc
) rq on rq.fulcrum_parent_id = fti.fulcrum_id;

Or use standard SQL's ROW_NUMBER:

select fti.*, rq.* 
from first_tranche_inspection_v2 fti
left join
(
  select 
    ftirq.*, 
    row_number() over (partition by fulcrum_parent_id order by date_reporting desc) as rn
  from first_tranche_inspection_v2_reporting_questionnaire ftirq
) rq on rq.fulcrum_parent_id = fti.fulcrum_id and rq.rn = 1;

What you were trying to do should look like this:

select fti.*, rq.* 
from first_tranche_inspection_v2 fti
left join first_tranche_inspection_v2_reporting_questionnaire rq
  on rq.fulcrum_parent_id = fti.fulcrum_id
  and (rq.fulcrum_parent_id, rq.date_reporting) in
  (
    select fulcrum_parent_id, max(date_reporting)
    from first_tranche_inspection_v2_reporting_questionnaire
    group by fulcrum_parent_id
  );

This works, too, and only has the disadvantage that you read the table first_tranche_inspection_v2_reporting_questionnaire twice.

Thorsten Kettner
  • 89,309
  • 7
  • 49
  • 73
  • I am using the first one...this works as a charm ..thanks – khushbu Feb 02 '17 at 04:21
  • What if I need the count of records where rq.fulcrum_parent_id = fti.fulcrum_id in the same query for single record? – khushbu Mar 06 '17 at 09:03
  • I think the third query you have written here would be useful to do this. What I also need now is the count of records for a unique record in fulcrum_id of first_tranche_inspection_v2 which matches fulcrum_parent_id of first_tranche_inspection_v2_reporting_questionnaire? Please help – khushbu Mar 06 '17 at 11:03
  • select fti.*, rq.* ,count from first_tranche_inspection_v2 fti left join first_tranche_inspection_v2_reporting_questionnaire rq on rq.fulcrum_parent_id = fti.fulcrum_id and (rq.fulcrum_parent_id, rq.date_reporting) in ( select count(fulcrum_parent_id) as count, fulcrum_parent_id,max(date_reporting) from first_tranche_inspection_v2_reporting_questionnaire group by fulcrum_parent_id ); – khushbu Mar 06 '17 at 11:07
  • I am trying the above code but it says subquery has too many columns. – khushbu Mar 06 '17 at 11:08
1

DISTINCT often ends up being implemented with a GROUP BY query in many RDBMS. What I think is happening in your current query is that there is already an implicit aggregation involving the columns in your SELECT. Hence, the correlated subquery involving MAX() actually is an aggregation because of the DISTINCT.

One quick workaround might be to perform the original query without DISTINCT, then subquery the result set to retain only distinct records:

WITH cte AS (
    SELECT t1.pa_serial_,
           t1.homeownerm_name,
           t1.districtvdc,
           t1.date as firstrancheinspection_date,
           t1.status,
           t1.name_of_data_collector,
           t1.fulcrum_id,
           t2.date_reporting
    FROM first_tranche_inspection_v2 t1
    LEFT JOIN first_tranche_inspection_v2_reporting_questionnaire t2
        ON t1.fulcrum_id = t2.fulcrum_parent_id
    WHERE t2.date_reporting = (SELECT MAX(t.date_reporting)
                               FROM first_tranche_inspection_v2 t
                               WHERE t.pa_serial_ = t1.pa_serial_)
);

SELECT DISTINCT t.pa_serial_,
                t.homeownerm_name,
                t.districtvdc,
                t.firstrancheinspection_date,
                t.status,
                t.name_of_data_collector,
                t.fulcrum_id,
                t.date_reporting
FROM cte t

Note that I went ahead and added an alias to the second table in your join, which leaves the query much easier to read.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • I am getting this error "ERROR: column t.date_reporting does not exist LINE 13: WHERE t2.date_reporting = (SELECT MAX(t.date_reporting)". Actually, date_reporting is coming from the secondary table "first_tranche_inspection_v2_reporting_questionnaire" and irst_tranche_inspection_v2 is my primary table. – khushbu Jan 31 '17 at 10:24
  • 1
    That makes no sense. At this point, you should update your question with sample data and desired output to be clear. – Tim Biegeleisen Jan 31 '17 at 10:39
  • "ERROR: missing FROM-clause entry for table "first_tranche_inspection_v2_reporting_questionnaire" LINE 3: Max(first_tranche_inspection_v2_reporting_questionnaire)" – khushbu Jan 31 '17 at 10:39
  • Again, what you are trying to do makes no sense. – Tim Biegeleisen Jan 31 '17 at 10:41
  • what I am trying to do is The info like pa_serial_, homeownerm_name, status..etc are obtained from first_tranche_inspection_v2 and other few columns are obtained from first_tranche_inspection_v2_reporting_questionnaire and I want to obtain the latest records based on date_reporting from first_tranche_inspection_v2_reporting_questionnaire table – khushbu Jan 31 '17 at 10:50