1

I'm little lost trying to solve a problem. At first I've this 5 tables:

CREATE TABLE DOCTOR (
    Doc_Number INTEGER,
    Name    VARCHAR(50) NOT NULL,
    Specialty   VARCHAR(50) NOT NULL,
    Address VARCHAR(50) NOT NULL,
    City    VARCHAR(30) NOT NULL,
    Phone   VARCHAR(10) NOT NULL,
    Salary  DECIMAL(8,2) NOT NULL,
    DNI     VARCHAR(10) UNIQUE,
    CONSTRAINT pk_Doctor PRIMARY KEY (Doc_Number),
    CONSTRAINT ck_Salary CHECK (Salary >0)
  );

CREATE TABLE PATIENT (
    Pat_Number  INTEGER,
    Name    VARCHAR(50) NOT NULL,
    Address     VARCHAR(50) NOT NULL,
    City        VARCHAR(30) NOT NULL,
    DNI     VARCHAR(10) UNIQUE,
    CONSTRAINT pk_PATIENT PRIMARY KEY (Pat_Number)
    );

CREATE TABLE VISIT (
    Doc_Number    INTEGER,
    Pat_Number    INTEGER,
    Visit_Date    DATE,
    Price     DECIMAL(7,2),
    Last_Drug     VARCHAR(50),
    CONSTRAINT Visit_pk PRIMARY KEY (Doc_Number, Pat_Number, Visit_Date),
    CONSTRAINT ck_Price CHECK (Price >0),
    CONSTRAINT Visit_Doctor_fk FOREIGN KEY (Doc_Number) REFERENCES DOCTOR(Doc_Number),
    CONSTRAINT Visit_PATIENT_fk FOREIGN KEY (Pat_Number) REFERENCES PATIENT(Pat_Number)
  );

CREATE TABLE PRESCRIPTION (
    Presc_Number    INTEGER,
    Drug        VARCHAR(50) NOT NULL,
    Doc_Number      INTEGER NOT NULL,
    Pat_Number      INTEGER NOT NULL,
    Visit_Date      DATE NOT NULL,
    CONSTRAINT Prescription_pk PRIMARY KEY (Presc_Number),
    CONSTRAINT Prescription_Visit_fk FOREIGN KEY (Doc_Number, Pat_Number, Visit_Date) REFERENCES VISIT(Doc_Number, Pat_Number, Visit_Date)
  );

CREATE TABLE VISITS_SUMMARY (
    Doc_Number      INTEGER,
    Pat_Number      INTEGER,
    Year            INTEGER,
    Drugs_Number    INTEGER,
    Visits_Number   INTEGER,
    Acum_Amount     DECIMAL(8,2),
    Last_Drug       VARCHAR(50),
    CONSTRAINT ck_Visits_Number CHECK (Visits_Number >0),
    CONSTRAINT ck_Acum_Amount CHECK (Acum_Amount >0),
    CONSTRAINT Visits_Summary_pk PRIMARY KEY (Doc_Number,   Pat_Number, Year),
    CONSTRAINT Summary_Doctor_fk FOREIGN KEY (Doc_Number) REFERENCES DOCTOR(Doc_Number),
    CONSTRAINT Summary_PATIENT_fk FOREIGN KEY (Pat_Number) REFERENCES PATIENT(Pat_Number)
);

I've filled the first 4 and I need to create a function to update the last one. The function must do:

  1. count the number of different drugs prescribed for one doctor to one patient in one year.
  2. count the number of visits of the patient with one doctor in a year
  3. add the total value of the visits for the patient with the doctor in a year
  4. return the last drug prescribed for one doctor to a patient in a year.

Also I need consider these possible errors:

  • doctor doesn't exists
  • patient doesn't exists
  • no visits for this doctor to this patient in this year

And finally save the information in the VISITS_SUMMARY table.

I've done the firsts 4 points separately in different functions using return and works:

CREATE OR REPLACE FUNCTION sum_visits (p_Doc_Number INTEGER, p_Pat_Number INTEGER, p_Year INTEGER)
RETURNS INTEGER AS $$
DECLARE
BEGIN
SELECT COUNT(Drug)INTO drugs_num
    FROM PRESCRIPTION pr
    WHERE pr.Doc_Number = p_Doc_Number AND pr.Pat_Number = p_Pat_Number AND
    (SELECT EXTRACT(YEAR FROM pr.Visit_Date)) = p_Year;
RETURN drugs_num;
END;
$$LANGUAGE plpgsql;

And the same with this other using the same function parameters only changing the return type.

SELECT COUNT(Visit_Date)INTO visits
    FROM VISIT v
    WHERE v.Doc_Number = p_Doc_Number AND v.Pat_Number = p_Pat_Number AND
    (SELECT EXTRACT(YEAR FROM v.Visit_Date)) = p_Year;

    total_price = 0.0;
    FOR visit_price IN SELECT Price FROM VISIT v
        WHERE v.Doc_Number = p_Doc_Number AND v.Pat_Number = p_Pat_Number AND
        (SELECT EXTRACT(YEAR FROM v.Visit_Date)) = p_Year LOOP
        total_price := total_price + visit_price;
        END LOOP;

    SELECT Drug INTO last_drg FROM PRESCRIPTION pr
    WHERE pr.Doc_Number = p_Doc_Number AND pr.Pat_Number = p_Pat_Number AND
        (SELECT EXTRACT(YEAR FROM pr.Visit_Date)) = p_Year AND
        Presc_Number = (SELECT MAX(Presc_Number)FROM PRESCRIPTION);

I've tried to do the exceptions using IF conditions but it doesn't work. Here is a complete example for one of the different operations of the function:

CREATE OR REPLACE FUNCTION sum_visits (p_Doc_Number INTEGER, p_Pat_Number INTEGER, p_Year INTEGER)
RETURNS void AS $$
DECLARE
    drugs_num INTEGER;
 BEGIN

    IF (PRESCRIPTION.Doc_Number NOT IN (p_Doc_Number))THEN
        RAISE EXCEPTION 
        'Doctor % doesn"t exists';
    ELSIF (PRESCRIPTION.Pat_Number NOT IN (p_Pat_Number))THEN
        RAISE EXCEPTION
        'Patient doesn"t exists';
    ELSIF((SELECT EXTRACT(YEAR FROM PRESCRIPTION.Visit_Date)) NOT IN p_Year) THEN
        RAISE EXCEPTION
        'Date % doesn"t exists'
    ELSE SELECT COUNT(Drug)INTO drugs_num
        FROM PRESCRIPTION pr
        WHERE pr.Doc_Number = p_Doc_Number AND pr.Pat_Number = p_Pat_Number AND
        (SELECT EXTRACT(YEAR FROM pr.Visit_Date)) = p_Year;
    end if;

    update VISITS_SUMMARY
    set Drugs_Number = drugs_num;

    exception
    when raise_exception THEN
    RAISE EXCEPTION' %: %',SQLSTATE, SQLERRM;
END;
$$LANGUAGE plpgsql;

I need help to use the update statement because even not considering the exceptions seems like table don't update and some help with the control exceptions.

There are some examples to fill the firsts table and call the function with this parameters (26902, 6574405, 2011)

INSERT INTO DOCTOR (Doc_number,Name,Specialty,Address,City,Phone,Salary,DNI) values
  (26902,'Dr. Alemany','Traumatologia','C. Muntaner, 55,','Barcelona','657982553',71995,'52561523T');

INSERT INTO PATIENT (Pat_Number, Name, Address, City, DNI) values
  (6574405,'Sra. Alemany ','C. Muntaner, 80','Barcelona','176784267B');

INSERT INTO VISIT (Doc_Number, Pat_Number,Visit_Date,Price) values
  (26902,6574405,'30/03/11',215);

INSERT INTO PRESCRIPTION (Presc_Number, Drug, Doc_Number, Pat_Number, Visit_Date) values
  (44,'Diclofenac',26902,6574405,'30/03/11')
, (45,'Ibuprofè',26902,6574405,'30/03/11')
, (46,'Ibuprofè',26902,6574405,'30/03/11');

I have more inserts if you want.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
alwayslearn
  • 103
  • 1
  • 9
  • Assumption: Your `UPDATE` statement in function should probably have a `WHERE` clause to identify the row you would like to update (you perhaps don't want to update the entire table in one function call). – Kamil Gosciminski Apr 24 '15 at 09:15
  • But table is empty, what can I do in the where clause? I can't say change the row where column X is ?? because it's empty, or, if I can, I don't now how. – alwayslearn Apr 24 '15 at 09:17
  • Could you please provide sample data and explanation of how exactly would you like `VISITS_SUMMARY` table to be UPDATED? I assume you would be doing this by hand or calling a function as an automated process, right? – Kamil Gosciminski Apr 24 '15 at 09:19
  • Yes, I thinked to call function by this way `SELECT * FROM sum_vists(p_Doc_Number INTEGER, p_Pat_Number INTEGER, p_Year INTEGER)` – alwayslearn Apr 24 '15 at 09:24
  • please update your question with some sample data to populate your tables. – Kamil Gosciminski Apr 24 '15 at 09:25
  • Run your inserts - they violate FK constraints. Update your statements so that data inserts are complete. – Kamil Gosciminski Apr 24 '15 at 09:35
  • ok I copy all of my inserts – alwayslearn Apr 24 '15 at 09:37
  • I can't copy all of my insetrs but I know what happens, the page thinks that date is yyyy/mm/dd but is dd/mm/yy in my case I don't know how to chage this in the web. – alwayslearn Apr 24 '15 at 09:43
  • Page has nothing to do with it. You had a `VISIT` in your script that was refering to a doctor of id `42632` that is not present in the table. It's also important to `INSERT` data to tables in correct order. Anyways, I edited your question. – Kamil Gosciminski Apr 24 '15 at 09:49
  • Yes, I've other inserts but they no fit in post, with inserts that I post you should use this parametetrs (26902, 6574405, 2011) and i think that I delete that insert with doctor 42632, sorry about that – alwayslearn Apr 24 '15 at 09:52

2 Answers2

2

Below is a function that I assumed does the job.

Some explanation might be needed, so here it is:

f_start_of_year and f_end_of_year are constructed to make a query sargable (able to use indexes to speed it up it's execution), because functions are blackboxes to Postgres optimizer, thus doing query WHERE function(visit_date) ... is not able to use index that is declared on column visit_date. For this particular case you would need to index on to_char(visit_date, 'YYYY') for example to get 2011 as character result. It's better to have one index and adjust queries to it, than the other way around. On the other hand, Postgres does evaluate right side of operator pretty quickly while left side remains so that it matches the index condition.

At the very beggining we are doing checks for existance of doctors, patients and visits.

If you would like to update statistics for all distinct doctor, patient records then a call could look like

SELECT sum_visits(doc_number, pat_number, 2011) 
FROM (
  SELECT doc_number, pat_number 
  FROM visit 
  GROUP BY 1,2
  ) foo;

In counting number of drugs I placed COUNT(DISTINCT drug) because you've said that you want to count existance of different drugs (so here if a doctor has prescribed a drug for one particular patient twice, it would only count as 1. To remove this behaviour simply delete DISTINCT clause.

Consider replacing RAISE EXCEPTION with RAISE NOTICE and RETURN clause - see manual for reference. Raising an EXCEPTION prevents from further execution of a function.

Then a function establishes operation that has to be done INSERT/UPDATE - because you might want to calculate statistics more frequently than once a year and so an INSERT statement would fail for visits_Summary_pk

As for RETURN value - you're getting exact information of what the function did and what statistics were updated/inserted for a particular row. This way you could do some logging. It could also help you in debugging.

CREATE OR REPLACE FUNCTION sum_visits (p_doc_number INTEGER, p_pat_number INTEGER, p_year INTEGER)
RETURNS text
LANGUAGE plpgsql
AS $$
DECLARE
f_start_of_year date := p_year || '-01-01';
f_end_of_year date := (p_year || '-01-01')::DATE + '1 year - 1 day'::INTERVAL;
f_drug_count integer := 0;
f_visits_count integer := 0;
f_price_sum decimal(8,2) := 0.00;
f_last_drug varchar(50);
f_check_if_record_exists boolean;
BEGIN
-- Checking
IF (SELECT count(*) FROM doctor WHERE doc_number = p_doc_number) = 0 THEN
    RAISE EXCEPTION 'Doctor % does not exist', p_doc_number;
END IF;
IF (SELECT count(*) FROM patient WHERE pat_number = p_pat_number) = 0 THEN
    RAISE EXCEPTION 'Patient % does not exist', p_doc_number;
END IF; 
IF (SELECT count(*) FROM visit WHERE doc_number = p_doc_number AND pat_number = p_pat_number AND visit_date BETWEEN f_start_of_year AND f_end_of_year) = 0 THEN
    RAISE EXCEPTION 'There are no visits for doctor %, patient % in year %', p_doc_number, p_pat_number, p_year;
END IF;

SELECT COUNT(DISTINCT drug) INTO f_drug_count 
    FROM prescription WHERE doc_number = p_doc_number AND pat_number = p_pat_number AND visit_date BETWEEN f_start_of_year AND f_end_of_year;
SELECT COUNT(*) INTO f_visits_count 
    FROM visit WHERE doc_number = p_doc_number AND pat_number = p_pat_number AND visit_date BETWEEN f_start_of_year AND f_end_of_year;
SELECT SUM(price) INTO f_price_sum 
    FROM visit WHERE doc_number = p_doc_number AND pat_number = p_pat_number AND visit_date BETWEEN f_start_of_year AND f_end_of_year;
SELECT drug INTO f_last_drug 
    FROM prescription WHERE doc_number = p_doc_number AND pat_number = p_pat_number AND visit_date BETWEEN f_start_of_year AND f_end_of_year ORDER BY visit_date DESC, presc_number DESC LIMIT 1;

SELECT CASE WHEN COUNT(*) > 0 THEN true ELSE false END INTO f_check_if_record_exists FROM visits_summary WHERE doc_number = p_doc_number AND pat_number = p_pat_number AND year = p_year;

IF (f_check_if_record_exists = 'f') THEN
INSERT INTO visits_summary(doc_number, pat_number, year, drugs_number, visits_number, acum_amount, last_drug)
    VALUES (p_doc_number, p_pat_number, p_year, f_drug_count, f_visits_count, f_price_sum, f_last_drug);
ELSE
UPDATE visits_summary SET 
    drugs_number = f_drug_count, visits_number = f_visits_count, acum_amount = f_price_sum, last_drug = f_last_drug
    WHERE doc_number = p_doc_number AND pat_number = p_pat_number AND year = p_year;
END IF;
RETURN CONCAT(CASE f_check_if_record_exists WHEN true THEN 'Updated' ELSE 'Inserted into' END || ' visits_summary for Doctor_ID: ',p_doc_number,' / Patient_ID: ',p_pat_number,' / Year: ',p_year,E'\n','WITH VALUES: drug_count: ',f_drug_count,', visits_count: ',f_visits_count,', price_sum: ',f_price_sum,', last_drug: ',COALESCE(f_last_drug,'none'));
END;
$$;

Some general tips:

  • Don't user varchar(n) as column type. There is no performance difference between varchar(n) and varchar, but changing the size of it later might hurt you. If you really want to limit characters that are stored in a column you are better off with varchar(n) and additional CHECK constraint - it's easy to change it in the future. Read first tip in documentation about character types
  • Consider removing CHECK constraints on visits_summary table and EXCEPTION that goes along with it. In my opinion it would be better to store it inside a table with value of 0 and later update it than not having it at all (you can gain more accurate statistics if you would ever want to aggregate and do any math by having all rows inside a table)
  • Consider adding indexes to speed up queries, for example index on visit(visit_date) would speed up queries used in function above.
  • Tip: While thinking about column order in composite indexes, always index by equality first, and then by ranges. The difference is that equality operator limits the first index column to a single value (simplified: when column values are unique). Then within the range for that value index is sorted according to a date column. The difference can of course be negligible, if a date column is very selective, but in most cases it's not. The bigger the date range becomes, the bigger the performance difference will be.

Edit: You could also replace checking part of the code by using special variable FOUND or NO_DATA_FOUND - more on that here

Kamil Gosciminski
  • 16,547
  • 8
  • 49
  • 72
  • Hey, thank you so much, it seems that works, I consider all your tips. I don't work with indexs yet but it's good to know. I've been studying sql for a month, I'm really new to this world. I'll study your function line by line to learn how to do that in other ocasion! Thank's for your patience with my problem! – alwayslearn Apr 24 '15 at 11:50
  • I'm glad I could help. If the answer helped you in some way please consider accepting it and voting :) – Kamil Gosciminski Apr 24 '15 at 11:57
1

Schema

There a a couple of interesting elements. Most strikingly: (Doc_Number, Pat_Number, Visit_Date) seems like a terrible idea for a primary key. If the doc rings twice the same day, you are screwed. Rather use a much more practical serial column as surrogate primary key:

Then you can also simplify the FK in prescription:

CREATE TABLE visit (
  visit_id serial NOT NULL PRIMARY KEY
, doc_number  int NOT NULL
, pat_number  int NOT NULL
, visit_date  date NOT NULL
, price       int  -- amount in Cent -- Can be NULL?
, last_drug   text  -- seems misplaced
, CONSTRAINT ck_price CHECK (price > 0)
, CONSTRAINT visit_doctor_fk FOREIGN KEY (doc_number) REFERENCES doctor
, CONSTRAINT visit_patient_fk FOREIGN KEY (pat_number) REFERENCES patient
);

CREATE TABLE prescription (
  presc_number int PRIMARY KEY  -- might also be a serial?
, visit_id     int NOT NULL REFERENCES visit
, drug         text NOT NULL
);

While being at it I am making price an integer column representing Cents. That's much cheaper. Display as € is simple, I'll demonstrate in the VIEW below.

And are you aware that the CHECK constraint ck_Price CHECK (Price > 0) allows NULL values? Might be as intended.

Function

The currently accepted answer has a number of good points. But not all of it is good. The suggested function is convoluted and inefficient and could be largely simplified.

More importantly, the whole idea of a hand-knit solution is dubious, error-prone, expensive and complicated. You run into the typical UPSERT problem and don't have a solution fit for concurrent use, yet. (A clean solution is under development and may or may not ship with Postgres 9.5.)

But none of this is necessary for you ...

Use a VIEW

I strongly suggest to consider replacing your table VISITS_SUMMARY with a VIEW or, if your tables are big and you need the read performance, a MATERIALIZED VIEW. Then you don't need a function at all. Building on my suggested improvements above:

extract(year FROM v.visit_date) AS year

CREATE MATERIALIZED VIEW AS
SELECT DISTINCT ON (1,2,3)
       v.doc_number
     , v.pat_number
     , extract(year FROM v.visit_date) AS year
     , count(*) OVER ()                AS visits
     , sum(v.price) OVER () / 100.0    AS acum_amount  -- with 2 fractional digits
     , sum(p.drugs_count) OVER ()      AS drugs_count
     , v.visit_date                    AS last_visit
     , p.last_drug
FROM   visit v
LEFT   JOIN (
   SELECT DISTINCT ON (1)
          visit_id
        , drug             AS last_drug
        , count(*) OVER () AS drugs_count
   FROM   prescription
   ORDER  BY 1, presc_number DESC
   LIMIT  1
   ) p USING (visit_id)
ORDER  BY 1, 2, 3, v.visit_date DESC;
  • Drugs_Number is very misleading name for a count of drugs, since you also have doc_number etc. Using drugs_count instead

  • Strictly speaking, since there can be multiple prescriptions for one visit, the "last drug" is ambiguous. Picking one arbitrary drug from the last visit (the "last" one).

  • The expression sum(v.price) / 100.0 forces the result to be numeric automatically, because the numeric constant 100.0 (with fractional digit) is assumed to be numeric automatically. Thus, the integer values in price (representing Cents) are displayed in the desired format with two fractional digits (representing €).

  • The query is somewhat tricky because you need aggregates from both tables as well as the "last" drug. I get the last drug and count per visit_id first, join to visit and calculate all aggregates with window functions to get the last drug from the last visit with DISTINCT ON.
    About DISTINCT ON:

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Hi @Erwin thanks for your answer, is good to know different possibilites. I'm doing some problems that I found in internet, books or in my notes (because I'm studiyng a degree), in this situation I take the tables of the statement and do what it ask me. But, as I say, is good to know different and optimal ways. I learn a lot in this site your answers and the answers of other people. Thanks again. – alwayslearn Apr 26 '15 at 11:16