0

Let's say that given a table observations_tbl with attributes date (day) and value, I want to produce the new attribute prev_day_value to get the following table:

|---------------------|-------|----------------|
|     date            | value | prev_day_value |
|---------------------|-------|----------------|
| 01.01.2015 00:00:00 | 5     | 0              |
| 02.01.2015 00:00:00 | 4     | 5              |
| 03.01.2015 00:00:00 | 3     | 4              |
| 04.01.2015 00:00:00 | 2     | 3              |
|---------------------|-------|----------------|

I am well-aware that such an output can typically be obtained using a WINDOW function. But how would I achieve this through a PostgreSQL user defined function? I want to indicate that I am in a situation where I must use a function, difficult to explain why without going into detail - these are the restrictions I have and if anything, it is a technical challenge.

Take into consideration this template query:

SELECT *, lag(value,1) AS prev_day_value -- or lag(record,1) or lag(date,value,1) or lag(date,1) or lag(observations_tbl,1), etc.
FROM observations_tbl

I am using function lag with parameter 1 to look for a value which comes before the current row by 1 - a distance of 1 row. I don't care what other parameters the function lag can have (table name, other attributes) - what could the function lag look like to achieve such functionality? The function can be of any language, SQL, PL/pgSQL and even C using PostgreSQL API/backend.

I understand that one answer can be wrapping a WINDOW query inside lag user defined function. But I am thinking that would be a rather costly operation if I have to scan the entire table twice (once inside the lag function and once outside). I was thinking that maybe each PostgreSQL record would have a pointer to its previous record which is directly accessible? Or that I can somehow open a cursor at this specific row / row number without having to scan the entire table? Or is what I am asking impossible?

Zeruno
  • 1,391
  • 2
  • 20
  • 39
  • Trying to replicate the efficiency of built-in functions with user-defined functions seems -- shall I say -- challenging. – Gordon Linoff Mar 01 '20 at 04:06
  • Are you sure that PostgreSQL has no way to address this requirement? – Zeruno Mar 01 '20 at 04:09
  • 2
    . . No, I do not know. You could write a wrapper around the internal definition. But something is amiss with the need to do this. It seems like an X-Y problem, where you are focusing on the wrong detail. – Gordon Linoff Mar 01 '20 at 04:12
  • You can inline the functionality into your stored procedure(s) until you find a better way to orient your data :/ – Ross Bush Mar 01 '20 at 04:19
  • If you "must" use a function for some bizarre reasons, then why not simply wrap a query using a window function into a SQL function? –  Mar 02 '20 at 07:24
  • @a_horse_with_no_name my post clearly says that I can do that! but a greater cost would be incurred, thus I also presented discussions on alternate ideas on how I think to work around this cost and asked if they are possible or not. – Zeruno Mar 02 '20 at 17:21
  • Wrapping a query into a `language sql` function incurs hardly any additional cost. –  Mar 02 '20 at 17:45
  • @a_horse_with_no_name I explained how I understood that it would incur scanning over the entire data twice instead of just one time if I have to scan the entire table twice - once inside the lag function and once outside. Is this incorrect on my part? Would help me a lot to know how you think about that. – Zeruno Mar 02 '20 at 20:53

2 Answers2

2

Your request is not possible to solve with relational tools (window functions are not relational extension in SQL). In C language you can write own alternative of function lag. You can do same work in PL8 language (Javascript). Unfortunately the API for window functions doesn't exist for PL/pgSQL. You cannot to write simple PL/pgSQL function that has access to different row than is processed.

The one possible alternative (but with some performance risk) is writing table function. There you have a control over all processed dataset, and you can do this operation simply.

CREATE OR REPLACE FUNCTION report()
RETURNS TABLE(d date, v int, prev_v int) $$
DECLARE r RECORD;
BEGIN
  prev_v := 0;
  FOR r IN SELECT date, value FROM observations_tbl t ORDER BY 1
  LOOP
    d := r.date; v := r.value;
    RETURN NEXT;
    prev_v := v;
  END LOOP;
END;
$$ LANGUAGE plpgsql;

There is not any other alternative usable solution. In very old date these values was calculated with correlated selfjoins, but this solution has pretty terrible performance.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Pavel Stehule
  • 42,331
  • 5
  • 91
  • 94
  • What exactly do you mean with "*window functions are not relational extension in SQL"*? –  Mar 02 '20 at 07:25
  • Relational algebra is based on a sets. The order of field inside set is not important. You can do some operations over these sets - union, multiplication, subset, ... The window function are based on different fundaments - there the order is important. – Pavel Stehule Mar 02 '20 at 08:06
  • The contention that windowed functions is not a relational extension to SQL is just plain wrong - see ISO SQL:2003 and ISO SQL:2008. You are correct that Dr Codd's original 1970 paper (see: Communications of the ACM, volume 13, number 6. 377-387) did not contain them, it was on storage and retrieval. Further window functions are all Post Retrieval so your relational algebra is finished. Are you contending also that ORDER BY is not valid SQL or other normal operations. – Belayer Mar 02 '20 at 20:51
  • @PavelStehule in your example, you specified the return type. I was wondering if there is a way to specify the return type within the body of the function? – Zeruno Mar 02 '20 at 20:58
  • @Zeruno - return type should be defined in interface only and in planning time. Postgres is type strict system. – Pavel Stehule Mar 03 '20 at 04:17
  • @Belayer - I didn't say so window functions are not ANSI/SQL. But these operations cannot be described by relational algebra. Some years ago there was a discussion how much the SQL language have to be relational. – Pavel Stehule Mar 03 '20 at 04:21
1

What Pavel posted, just with fewer assignments. Should be faster:

CREATE OR REPLACE FUNCTION report()
  RETURNS TABLE(d date, v int, prev_v int) AS
$func$
BEGIN
   prev_v := 0;
   FOR d, v IN
      SELECT date, value FROM observations_tbl ORDER BY 1
   LOOP
      RETURN NEXT;
      prev_v := v;
   END LOOP;
END
$func$  LANGUAGE plpgsql;

The general idea can pay if it actually replaces multiple scans over the table with a single one. Like here:

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228