Alternatives to broken PL/ruby: convert a warehouse journal table

Question

After upgrading Postgres 8.4 -> 9.3 and Ruby 1.8 -> 2.1, PL/ruby fails to run. I get immediate Postgres server coredump on first execution of any PL/ruby function. I am in the process of analyzing stacktraces, but it does not look good. Also, the maintenance state of PL/ruby does not look good.

So, changing focus to the actual database problem for which I use PL/ruby and think about alternatives.

Simplified example of problem: Given a warehouse journal as a database table with following fields:

date (date)
type of good (foreign key)
count (numeric)

Considering the warehouse runs in strict FIFO mode, I need a list that shows how long each bunch of items has stayed in the warehouse (and a list of remainders which are still in the warehouse):

journal_recno_in (foreign key)
journal_recno_out (foreign key)
count (numeric)

All other info can easily be joined to this.

I need this dynamically created within the current SQL query, so that the most recent data gets included; therefore an external procedure is ruled out. I consider it impossible to solve this with plain SQL query language, so a procedural language seems the only option.

I tried with PL/pgSQL, this is definitely possible, but it looked crude and ugly.

Now I am searching for the way of least pain, having in mind future extensions. Ruby would be my clear favourite, as this language seems to almost code itself along my thinking. But if PL/ruby cannot be brought to a solid behaviour (and that looks currently like a lot additional work and learning), that is rather pointless.

Suggestions? Things that I may have overlooked?

Addendum: outcome of the stacktraces

The first issue was that PL/ruby sets ruby SAFE_LEVEL to 12, while ruby 2.1 accepts a maximum of 3 and does raise otherwise. That is easy to correct, and then it becomes possible to do simple functions. But when doing RETURNS SETOF functions, it crashes again, this time from near rb_iterate() in the ruby lib. I gave up here, with the conclusion that PL/ruby might need a lookover from beginning to end (i.e. 5000+ loc).

@Erwin: here is your req'd data:

Input table:

CREATE TABLE events (
  id serial PRIMARY KEY,
  datum date NOT NULL,
  name_id integer,
  count numeric(12,4),
  created_at timestamp without time zone,
  updated_at timestamp without time zone,
);

Output format:

SELECT * FROM ev_fifo() AS (id_in int, id_out int, 
                           datum_in date, datum_out date, 
                           name_id int, 
                           count numeric)

Input example:

  id  |   datum    | name_id |  count     |      created_at     |      updated_at
------+------------+---------+------------+---------------------+---------------------
  1   | 23.04.2008 |       1 |     1.0000 | 23.04.2008 02:11:45 | 06.06.2008 02:11:45
  2   | 28.04.2008 |       2 |    50.0000 | 29.04.2008 07:17:24 | 16.12.2008 04:32:43
  3   | 03.07.2008 |       2 |   250.0000 | 21.07.2008 01:26:15 | 16.12.2008 04:36:20
  4   | 03.07.2008 |       2 |    -1.0000 | 21.07.2008 01:31:00 | 16.12.2008 04:37:22
  5   | 03.07.2008 |       1 |    -1.0000 | 21.07.2008 01:28:19 | 16.12.2008 04:36:50
  6   | 04.07.2008 |       2 |   -60.0000 | 21.07.2008 01:32:26 | 16.12.2008 04:37:50

Desired output:

  id_in  |  id_out  |  datum_in  |  datum_out  |  name_id |    count
---------+----------+------------+-------------+----------+-----------
  2      |  4       | 28.04.2008 | 03.07.2008  |        2 |    1.0000
  1      |  5       | 23.04.2009 | 03.07.2008  |        1 |    1.0000
  2      |  6       | 28.04.2008 | 04.07.2008  |        2 |   49.0000
  3      |  6       | 03.07.2008 | 04.07.2008  |        2 |   11.0000
  3      |  NULL    | 03.07.2008 | NULL        |        2 |  239.0000

plv8 or plcoffee might be easier for you to use (or at least more familiar) and they even appear to be maintained: http://pgxn.org/dist/plv8/ — mu is too short, Apr 14 '16 at 02:16
The V8 JavaScript engine for Postgres is probably your best plan. PL/SQL is crude and ugly, but very well supported. You could also do this in Ruby externally via a Postgres database connection. — tadman, Apr 14 '16 at 03:11
`I consider it impossible to solve this with plain SQL query language`. I don't. Modern SQL can probably do it. Provide your actual table definitions (a complete `CREATE TABLE` statement showing data types and constraints), a small set of sample data, the desired result and your actual Postgres version. (Why Postgres 9.3? if you upgrade, upgrade all the way to Postgres 9.5!) If you really need procedural elements, switch to PL/pgSQL. — Erwin Brandstetter, Apr 14 '16 at 22:15
`2009` in your result is a typo, I assume? Also, I am only notified if you [@-notify me in a comment](http://meta.stackexchange.com/questions/43019/how-do-comment-replies-work). — Erwin Brandstetter, Apr 15 '16 at 03:32

score 1 · Answer 1 · answered Apr 14 '16 at 07:48

Let's start with your options:

pl/pgsql and sql
pl/perl, pl/pythonu, and pl/tcl
Other pls

These major categories have differences in strengths and weaknesses. They also have differences in how you go about things. One of the big weaknesses of external pls like pl/ruby is that if they fail to be maintained, you may have a problem later.

PL/PGSQL and SQL

In these cases you can probably express your changes as an SQL query with recursive common table expression. Then you can use sql or, if you need some slight procedural support, add that and use pl/pgsql. This is usually how I approach this.

PL/Perl, PL/TCL and PL/PythonU

You may also be able to port your Ruby code to Python or Perl and use the PL variations of these languages. These PL's are widely used and maintained as part of the core distribution of PostgreSQL. They are not going away. This would allow you better transparency of how the logic moves over.

One significant limitation of PL/Python is it has no trusted mode, and one issue you will run into with pl/perl is that trusted mode means no access to external modules.

score 0 · Accepted Answer · edited Jun 20 '20 at 09:12

Pure SQL, single query

As proof of concept, since I kind of challenged you, this single SQL query does it all:

WITH i AS (  -- input summed up
   SELECT id_in, datum_in, name_id, count, numrange(sum - count, sum) AS rng
   FROM  (
      SELECT id AS id_in, datum AS datum_in, name_id, count
           , sum(count) OVER (PARTITION BY name_id ORDER BY datum, id) AS sum
      FROM   events
      WHERE  count > 0
      ) sub
   )
,    o AS (  -- output summed up
   SELECT id_out, datum_out, name_id, count, numrange(sum + count, sum) AS rng
   FROM  (
      SELECT id AS id_out, datum AS datum_out, name_id, count
           , sum(count) OVER (PARTITION BY name_id ORDER BY datum, id) * -1 AS sum
      FROM   events
      WHERE  count < 0
      ) sub

   UNION ALL  -- add ghost range for items still in store
   SELECT NULL AS id_out, NULL AS datum_out, name_id, sum_in - sum_out AS count
        , numrange(sum_out, sum_in) AS rng
   FROM   (
      SELECT name_id, sum(CASE WHEN count > 0 THEN count END)          AS sum_in
           , COALESCE(sum(CASE WHEN count < 0 THEN count END) * -1, 0) AS sum_out
      FROM   events
      GROUP  BY 1
      ) sub
   WHERE  sum_in > sum_out  -- only where items are left
   )
SELECT i.id_in, o.id_out, i.datum_in::text, datum_out::text, i.name_id
     , upper(i.rng * o.rng) - lower(i.rng * o.rng) AS count  -- range intersect operator *
FROM   i
JOIN   o USING (name_id)
WHERE  i.rng && o.rng  -- range overlaps operator &&
ORDER  BY datum_out, id_out, datum_in, id_in;

Assuming that the underlying table is consistent: no more items can be deducted than have been added previously. I.e., sum of output <= sum of input per name_id.

Tested with Postgres 9.3. Produces your result exactly. And should perform decently.

Working with range types and range operators to simplify the task.

SQL Fiddle with extended data to show corner cases.

PL/pgSQL function with two cursors

I expect this approach to be substantially faster, though: Run two cursors in parallel, one over input, the other over output columns. So we walk through the table only once.

This related answer implements the basic logic (chapter "FNC - Function"):

Window Functions or Common Table Expressions: count previous rows within range

Oh my gods! This is terrific! I don't understand it yet, but I ran it on the full data, and it seems to provide proper results. Never imagined that would be possible. And You're using constructs which I just didn't know about. (Probably, in 2008, when I wrote the ruby code, they weren't present yet?) — PMc, Apr 16 '16 at 19:58
@PMc: 2008, that would be Postgres 8.3. No CTEs yet, no window functions, no range types / operators. The query would have been *possible*, but very verbose, awkward and slow. How does the query perform as compared to your previous solution? (If performance is important look into my 2nd suggestion.) — Erwin Brandstetter, Apr 16 '16 at 20:42
Performance - I can only say that the ruby code appeared to be slower. This runs on ~1000 records, so what I am seeing is some ms which might rather reflect the mood of the machine. Also, this is only the logical core of the routine - what takes more noticeable time are peripherial tasks like joining price quotes (The piece does calculate tax&performance on some company share investments - and there is still the task of handling share split events - on occasion, and with the tools You showed me, I'll ponder about that.) — PMc, Apr 17 '16 at 04:21
@PMc: For just a thousand rows, look no further. This should be good enough. Have fun with your new toys! :) — Erwin Brandstetter, Apr 17 '16 at 12:08

Alternatives to broken PL/ruby: convert a warehouse journal table

Addendum: outcome of the stacktraces

2 Answers2

Pure SQL, single query

PL/pgSQL function with two cursors

Linked