0

I am trying to query the following data:

Student_ID  Site  Start    End       Primary_or_Secondary
1           A     1/1/19   2/28/19   Primary
1           B     2/1/19   6/30/19   Secondary
1           C     3/1/19   6/30/19   Primary

and get a result that looks like the following:

Student_ID  Primary   Secondary  Start   End
1           A         null       1/1/19  1/31/19
1           A         B          2/1/19  2/28/19
1           C         B          3/1/19  6/30/19

So basically, a site can be primary or secondary site for a student, and I want to be able to see all the time frames the student are enrolled separately instead of any time frame overlapping.

I have wracked my brain about how I might do this in PostgreSQL, and have even looked at the crosstab function, but the dates are making it hard for my brain :-)

Any help with a query or set of queries, including some CTEs would be really helpful!

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • So we can assume that the same student never has overlapping primaries or secondaries? Each can only overlap with the other? And we can assume current Postgres? And do you query one student at a time or *all* students at once? – Erwin Brandstetter Jul 10 '19 at 01:43
  • Great Questions: 1. Yes, the same student does not have overlapping primaries and secondaries, and the primary and secondary only overlap with each other. 2. Yes, I believe it is the most current Postgres, but it is hosted, so I will double check, but I know it is pretty current. 3. I will be querying a group of students, not just one at a time. – Jacob Walker Jul 10 '19 at 11:29
  • 1
    Please [edit] your question accordingly. And use unambiguous date literals. I suggest ISO format: `'2019-02-28'` (always unambiguous). A table definition showing data types and constraints (`CREATE TABLE` script) would also be instrumental. – Erwin Brandstetter Jul 10 '19 at 17:00
  • 1
    Note: it is more convinient to use half-open intervals(jan1--feb1 instead of jan1--jan31) And: please don't use MDY date formats] – wildplasser Jul 10 '19 at 19:45

1 Answers1

0

This isn't trivial. A mix of crosstab with overlapping and intersecting ranges, plus corner cases (merge same start / end dates) on top. Very hard to solve with set-based operations i.e. pure SQL.

I suggest a procedural solution in PL/pgSQL instead. Should perform nicely, too, as it only needs a single (bitmap-index) scan over the table:

CREATE OR REPLACE FUNCTION f_student_xtab(VARIADIC _student_ids int[])
  RETURNS TABLE (
      student_id int
    , "primary"  text
    , secondary  text
    , start_date date
    , end_date   date) AS
$func$
DECLARE
   r record;
BEGIN
   student_id := -1; -- init with impossible value

   FOR r IN
      SELECT t.student_id, t.site, t.primary_or_secondary = 'Primary' AS prim, l.range_end, l.date
      FROM   tbl t
      CROSS  JOIN LATERAL (
         VALUES (false, t.start_date)
              , (true , t.end_date)
         ) AS l(range_end, date)
      WHERE  t.student_id = ANY (_student_ids)
      ORDER  BY t.student_id, l.date, range_end -- start of range first
   LOOP
      IF r.student_id <> student_id THEN
         student_id := r.student_id;
         IF r.prim THEN "primary" := r.site;
                   ELSE secondary := r.site;
         END IF;
         start_date := r.date;
      ELSIF r.range_end THEN
         IF r.date < start_date THEN
            -- range already reported
            IF r.prim THEN "primary" := NULL;
                      ELSE secondary := NULL;
            END IF;
            start_date := NULL;
         ELSE
            end_date := r.date;
            RETURN NEXT;
            IF r.prim THEN
               "primary" := NULL;
               IF secondary IS NULL THEN start_date := NULL;
                                    ELSE start_date := r.date + 1;
               END IF;
            ELSE
               secondary := NULL;
               IF "primary" IS NULL THEN start_date := NULL;
                                    ELSE start_date := r.date + 1;
               END IF;
            END IF;
            end_date := NULL;
         END IF;
      ELSE  -- range starts
         IF r.date > start_date THEN
            -- range already running
            end_date := r.date - 1;
            RETURN NEXT;
         END IF;
         start_date := r.date;
         end_date := NULL;
         IF r.prim THEN "primary" := r.site;
                   ELSE secondary := r.site;
         END IF;
      END IF;
   END LOOP;
END
$func$  LANGUAGE plpgsql;

Call:

SELECT * FROM f_student_xtab(1,2,3);

Or:

SELECT * FROM f_student_xtab(VARIADIC '{1,2,3}');

db<>fiddle here - with extended test case

About VARIADIC:

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Thank you so much!!! I was wracking my brain on this one, and I came to a similar conclusion about it not being trivial. I will take some time to dive into your code, and I truly appreciate the time you spent helping with this. – Jacob Walker Jul 13 '19 at 13:37
  • @JacobWalker: If this answered your question properly, consider accepting it. – Erwin Brandstetter Dec 20 '22 at 03:52