UDF to encode strings works in psql and Perl but not in Python

Question

I wrote a user defined function in Postgres 9.4 to encode strings:

CREATE OR REPLACE FUNCTION platform.encode_sig(sig text)   
RETURNS bigint AS $BODY$ 
  declare   sig_id bigint; 
begin
    lock table platform.sig2encodings in access exclusive mode;   
    execute 'select sig_id from platform.sig2encodings where sig = ''' || sig || '''' into sig_id;

    if sig_id is null   
    then
       raise notice 'I do not have encoding for %', sig;
       execute 'insert into platform.sig2encodings (sig) values (''' || sig || ''')';
       execute 'select sig_id from platform.sig2encodings where sig = ''' || sig || '''' into sig_id;   
    else
       raise notice 'I do have encoding for %', sig;   
    end if;

  return sig_id;

END; 
$BODY$   
LANGUAGE plpgsql VOLATILE   COST 100;

The table:

CREATE TABLE platform.sig2encodings
(
  sig_id bigserial NOT NULL,
  sig text,
  CONSTRAINT sig2encodings_pkey PRIMARY KEY (sig_id ),
  CONSTRAINT sig2encodings_sig_key UNIQUE (sig )
)

The call in pgadmin or psql inserts the data into the table:

select * from platform.encode_sig('NM_Gateway_NL_Shutdown');

The call in python gets the id, but does not insert the data:

db="""dbname='XXX' user='XXX' password='XXX' host=XXX port=XXX"""

def encode_sig(sig):
   try:
      conn=psycopg2.connect(db)
   except:
      print "I am unable to connect to the database."
      exit()

   cur = conn.cursor()
   try:
      sql = "select * from platform.encode_sig('" + sig + "');"
      print sql
      cur.execute(sql)
   except:
      print "I can't retrieve sid"

   row = cur.fetchone()
   return row[0]

print str(encode_sig('NM_Gateway_UDS_CC'))

Output from the python script:

$ ./events_insert.py 
616
617
618
619
620
621
$ ./events_insert.py 
622
623
624
625
626
627

The table in postgres is empty. What is going on?

Update:

The following perl script works (with all the console outputs (NOTICEs) and the rows in the table):

#!/usr/bin/perl

use strict;
use warnings;

use Data::Dumper;
use DBI;

my $dbh = get_connection();
$dbh->do("SELECT platform.encode_sig('blah blah blah')");
$dbh->disconnect();

sub get_connection {
    return DBI->connect('dbi:Pg:dbname=XXX;host=XXX;port=XXX',
                        'XXX', 'XXX', { RaiseError => 1 });
}

The DB configuration is pretty standard one. These lines come from the postgresql.conf (since they are commented out, default values are assumed):

#fsync = on                             # turns forced synchronization on or off
#synchronous_commit = on                # synchronization level;
                                        # off, local, remote_write, or on
#wal_sync_method = fsync                # the default is the first option
                                        # supported by the operating system:
                                        #   open_datasync
                                        #   fdatasync (default on Linux)
                                        #   fsync
                                        #   fsync_writethrough
                                        #   open_sync
#full_page_writes = on                  # recover from partial page writes
#wal_log_hints = off                    # also do full page writes of non-critical updates
                                        # (change requires restart)
#wal_buffers = -1                       # min 32kB, -1 sets based on shared_buffers
                                        # (change requires restart)
#wal_writer_delay = 200ms               # 1-10000 milliseconds

#commit_delay = 0                       # range 0-100000, in microseconds
#commit_siblings = 5                    # range 1-1000

Please show us the output of your Python program. Which credentials is it using to connect to the database? Finally, why are you writing your code in a way that is susceptible to a SQL injection attack? — Colin 't Hart, Apr 13 '15 at 21:58
And why are you locking the table? Your primary and unique keys should offer you more than enough locking against simultaneous inserts of conflicting values. — Colin 't Hart, Apr 13 '15 at 21:59
You should use `select platform.encode_sig('" + sig + "');` for scalar functions. `select * from ...` is for functions returning a result set (i.e. multiple rows) — , Apr 13 '15 at 22:09
Which `NOTICE` do you see? 'I do have encoding' or 'I do not have encoding'? — Erwin Brandstetter, Apr 15 '15 at 13:26
we tested the functionality java too. Without the commit statement the rows were gone. It might be that python does not do the commit after the "select" statement resulting in dropped rows. Why the sequence is updated remains to be investigated — arthur, Apr 27 '15 at 21:42

score 2 · Accepted Answer · edited May 23 '17 at 11:43

It's unclear, how the table can be empty after you see sig_id's returned. The only plausible explanations that come to mind:

You are checking a different table (in a different schema or a different DB) by accident.
You are running with auto_commit = off and forgot to COMMIT your transaction. Results are not visible to other sessions before COMMIT.

Either way, your function is needlessly convoluted, you wouldn't need dynamic SQL with EXECUTE. Since you are concatenating unescaped text parameters into code, you are wide open to random syntax errors and SQL injection.
You are also dangerously close to naming conflicts between the parameter name sig and the column name sig. You ditch this last bullet with your dynamic SQL, but it's still a loaded foot-gun. Read the chapter Variable Substitution for PL/pgSQL in the manual and consider unique names.

Finally, it is also extremely inefficient to have one function call per row. The whole procedure can be replaced with this single SQL statement:

LOCK TABLE platform.sig2encodings IN ACCESS EXCLUSIVE MODE;

WITH sel AS (
   SELECT e.sig_id, e.sig
       , (s.sig IS NULL) AS insert_new
   FROM   platform.encode_sig e
   LEFT   JOIN platform.sig2encodings s USING (sig)
   )
,    ins AS (
   INSERT INTO platform.sig2encodings (sig)
   SELECT sig FROM sel WHERE insert_new
   RETURNING sig_id, sig, true  -- value for insert_new
   )
SELECT * FROM sel WHERE NOT insert_new
UNION ALL
SELECT * FROM ins;

This inserts all sig from encode_sig into sig2encodings that were not there, yet. It returns the resulting sig_id, sig and insert_new = true, appended to sig_id, sig and insert_new = false from encode_sig that were not inserted.

If you need a function for single row INSERT-or-SELECT that's safe for concurrent use:

Is SELECT or INSERT in a function prone to race conditions?

Or you hope that INSERT .. ON CONFLICT IGNORE makes it into the next release to simplify things:

PL/pgSQL column name the same as variable

Update: It has been committed for 9.5. The /devel manual already has instructions.

the UDF is meant to be an interface to generate strings for ids, which is meant to be executed from different environmens including java, perl, python and C++. Calling a UDF looked from the coding point of view nicer in JAVA and python code, compared a rather long SQL query... Comments? — arthur, Apr 22 '15 at 17:13

score 0 · Answer 2 · answered May 12 '15 at 12:01

After long try and error we found out that this was resulted by the lack of "commit" statement for the connection. python (and alternatively a combination of java+postgres drivers+postgres) skipped the "commit" statement on the exit of the script resulting in inconsistent state of the DB (the sequence was updated but the tables were not). So the solution is to add the following line in python script:

  conn.commit()

UDF to encode strings works in psql and Perl but not in Python

2 Answers2