0

Basically, I'm working with data from icu patients and studying a particular disease. For every patient, I have the disease stage during all of the patient's stay in the ICU, as you can see here:

timeline table

The other table I have it's the records table:

records table

I simply want to add to the records table the stage of the disease when that measure was made, which can be made simply by records.charttime BETWEEN timeline.starts AND timeline.ends, but I want to do it in the most optimized way possible due to the records table having nearly 100 million rows, and the timeline table over 100k rows.

I read about the usage of CURSOR, but I don't know if it makes sense using CURSOR in this case, or maybe I'm just adding complexity and it can be solved easily using JOIN.

What's the best solution?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • 2
    Please provide table definition and sample data as plain text, not a link to an image. And always declare your version of Postgres. Consider instructions here: https://stackoverflow.com/tags/postgresql-performance/info – Erwin Brandstetter Apr 20 '20 at 03:29
  • Are you trying to add a new column to the "records" table and populate it? Or are you just trying to view the results of a join dynamically without altering the structure of your tables? – jjanes Apr 20 '20 at 12:16

1 Answers1

1

Looping with an explicit cursor is typically slower than a plain join. Postgres will do the looping for you, most likely with a nested loop in the query plan.

SELECT r.*, t.stage
FROM   records       r
LEFT   JOIN timeline t ON r.icustay_id = t.icustay_id -- guessing ... ?
                      AND r.charttime >= t.starts
                      AND r.charttime <  t.ends;      -- excl. upper bound

BETWEEN would (probably incorrectly) include the upper bound.

The LEFT JOIN preserves all rows from records, even if no match is found in timeline.

I assume ranges in timeline cannot overlap? Else, you get duplicated rows. There should probably be an exclusion constraint for table timeline. See:

Since you are processing the whole table anyway, indexes won't be of much help.

However, since your records table is so big, you may not have enough memory available to process everything in RAM, so it might pay to process partitions of your records table at a time. The best way to partition depends on undisclosed details, most importantly the physical sort order of rows in both tables.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228