-1

I am using Proc SQL within SAS.

When I use the GROUP BY statement in my main query, I get this error:
NOTE: A GROUP BY clause has been discarded because neither the SELECT clause nor the optional HAVING clause of the associated table-expression referenced a summary function.

SELECT
    HP.Area
    ,HP.Name
    ,HP.NPI
    ,FACILITIES.ID
    ,(SELECT COUNT(*) FROM EVAL.CITATIONS C
     WHERE C.ID = CITATIONS.ID
     ) AS Total_Citations
FROM EVAL.HP HP
LEFT JOIN EVAL.FACILITIES FACILITIES
    ON FACILITIES.NPI = HP.NPI
LEFT JOIN EVAL.CITATIONS CITATIONS
    ON CITATIONS.ID = FACILITIES.ID
GROUP BY CITATIONS.ID

When I run this program:

  1. I get duplicate results
  2. The Total_Citations counts all rows in the citations table because it's ignoring the group by statement.

Output:

HP.Area HP.Name HP.NPI FACILITIES.ID Total_Citations
AV OMG Inc. 1234 001 17026
AV OMG Inc. 1234 001 17026
AV Why 1241 512 17026
AV Why 1241 512 17026
BP Dis 8305 643 17026
BP Happening 8221 346 17026

It should look like:

HP.Area HP.Name HP.NPI FACILITIES.ID Total_Citations
AV OMG Inc. 1234 001 14
AV Why 1241 512 0
BP Dis 8305 643 0
BP Happening 8221 346 36

The HP table is my main table and I want to left join FACILITIES and CITATIONS tables. FACILITES has the unique identifier (NPI) that connects HP and CITATIONS tables together. CITATIONS has a row for each citation for every facility for a given time period. I am trying to get the total number of citations from CITATIONS per ID.

philipxy
  • 14,867
  • 6
  • 39
  • 83
  • 1
    Your current query makes no sense. Please show sample input and the outpu you want. – Tim Biegeleisen Nov 16 '22 at 04:32
  • @TimBiegeleisen I'd be happy to. Isn't the sample output the table I included of what I was expecting? What do you mean by sample input? Sample of the tables I'm using? – Melanie Kim Nov 16 '22 at 04:35
  • 1
    Debug questions require a [mre]. When you pin down your issue research it. [How much research effort is expected of Stack Overflow users?](https://meta.stackoverflow.com/q/261592/3404097) Asking for off-site resources is off-topic. Please avoid social & meta commentary. [ask] [Help] [Why is “Can someone help me?” not an actual question?](https://meta.stackoverflow.com/q/284236/3404097) – philipxy Nov 16 '22 at 06:09
  • 1
    Please ask 1 specific researched non-duplicate question. Please either ask about 1 bad definition/query/function with the obligatory [mre] & why you think it should return something else at the 1st subexpression that it doesn't give what you expect, justified by reference to authoritative documentation, or ask about your overall goal giving working parts you can do & ideally a [mre]. But please ask about the former 1st because misconceptions in the former will get in the way of understanding the latter. And bad code doesn't tell us what you wish it would do. [ask] [Help] – philipxy Nov 16 '22 at 06:29
  • Why do you get results at all when the code gives you an error message? Is there a note in the log that this query requires remerging the data? Because a "normal" group-by code is like – gregor Nov 16 '22 at 07:12
  • A [mre] includes cut & paste & runnable code including initialization; desired & actual output (including verbatim error messages); tags & versions; clear specification & explanation. For SQL include DDL & tabular initialization code. For debug that includes the least code you can give that is code that you show is OK extended by code that you show is not OK. [ask] [Help] When you get a result you don't expect, pause your overall goal, chop to the 1st subexpression with unexpected result & say what you expected & why, justified by documentation. (Debugging fundamental.) – philipxy Nov 19 '22 at 21:15
  • Please don't put answers in question posts, they belong in answer posts. But since this is a duplicate, answers are at the Q&A in the system duplicate notice. I edited the answer out of this. Glad you found how to click to agree on the duplicate. I's good you put in more towards a [mre]. See the link & my summary about how to improve it. – philipxy Nov 20 '22 at 06:19

1 Answers1

0

I suggest the following as a solution:

proc sql;
SELECT
   HP.Area
   ,HP.Name
   ,HP.NPI
   ,HP.ID
   ,COUNT(*) AS Total_Citations
FROM EVAL.HP
LEFT JOIN EVAL.CITATIONS 
   ON CITATIONS.ID = HP.ID
GROUP BY HP.Area, HP.Name, HP.NPI, HP.ID;
quit;

Here, the HP table is used to get the area, name etc. and the table is joined with the CITATIONS table. Your subquery is not necessary because the join will already give you the wanted number of citations per ID. If you want to have Area, Name and ID in your resulting table then you should add these columns in the group-by clause, as well. I could not see the relevance of your third table but if you need columns from the FACILITIES table then you can join this table and the wanted columns but put these columns also in the group-by. Note: If there are more than one entry for an ID in the FACILITIES table then you will get duplicates in your result again.

gregor
  • 289
  • 6