11

I run the exact same Python function, one as a PostgreSQL PL/Python, and the other one outside PostgreSQL as a usual Python script.

Surprisingly, when I call the PostgreSQL PL/Python using select * from pymax7(20000);, it takes on average 65 seconds, while when I call the usual Python script python myscript.py 20000 it takes an average 48 seconds. The averages were computed running the queries and scripts 10 times.

Should such a difference be expected? How does Python inside the PostgreSQL RDBMS (PL/Python) compares with Python outside it in terms of performances?

I'm running PostgreSQL 9.1 and Python 2.7 on Ubuntu 12.04 64bits.

PostgreSQL PL/Python:

CREATE FUNCTION pymax7 (b integer)
  RETURNS float
AS $$    
  a = 0
  for i in range(b):
    for ii in range(b):
      a = (((i+ii)%100)*149819874987) 
  return a
$$ LANGUAGE plpythonu;

Python:

import time
import sys

def pymax7 (b):     
    a = 0
    for i in range(b):
        for ii in range(b):
            a = (((i+ii)%100)*149819874987) # keeping Python busy
    return a

def main():    
    numIterations = int(sys.argv[1])        
    start = time.time()
    print pymax7(numIterations)
    end = time.time()
    print "Time elapsed in Python:"
    print str((end - start)*1000) + ' ms'        

if __name__ == "__main__":
    main()
Franck Dernoncourt
  • 77,520
  • 72
  • 342
  • 501
  • 1
    Comparison with other PL languages would be more fair. – Paulo Scardine May 16 '13 at 00:13
  • 1
    Since PL/x calls the interpreter for x (x = Python ; Perl ; etc), isn't it an interesting comparison to assess how much overhead PL/x has over x? Unless I'm missing something? – Franck Dernoncourt May 16 '13 at 00:15
  • It will not call the interpreter executable, it will bind the target language using embedding mechanisms (in the case of Python, dynamically linking to python libraries, I guess). – Paulo Scardine May 16 '13 at 00:33
  • Doesn't the PostgreSQL interpreter embed CPython? (someone says so in the 6th comment of http://stackoverflow.com/a/16277947/395857) – Franck Dernoncourt May 16 '13 at 00:45
  • That's extremely weird, as PostgreSQL indeed just embeds CPython. I would expect differences when you're doing SQL queries from within PL/Python, but not with regular processing. There should be some small overhead involved in launching PL/Python functions, but not seconds worth. – Craig Ringer May 16 '13 at 01:16
  • 1
    I noticed the issue when writing 2 modules for MADlib (PostgreSQL addon) and the difference of runtime seems to increase as the number of iteration increases (= it's not a fixed cost): http://img849.imageshack.us/img849/7456/sofranckgraph.png ; http://img20.imageshack.us/img20/9621/sonazeengrph.png The only thing I can't think of right now would be that the CPython embedded within PostgreSQL is less optimized than the one I use outside PostgreSQL. Do you know if there is any way to check which Python interpreter PostgreSQL 9.1 embeds? – Franck Dernoncourt May 16 '13 at 02:32
  • (although I would find it surprising that there would be such a difference of performance between 2 CPython versions) – Franck Dernoncourt May 16 '13 at 02:34

1 Answers1

5

There shouldn't be any difference. Both of your test cases have about the same run time for me, 53 seconds plus or minus 1.

I did adjust the PL/Python test case to use the same measuring technique as the plain Python test case:

CREATE FUNCTION pymax7a (b integer)
  RETURNS float
AS $$
  import time
  start = time.time()
  a = 0
  for i in range(b):
    for ii in range(b):
      a = (((i+ii)%100)*149819874987)
  end = time.time()
  plpy.info("Time elapsed in Python: " + str((end - start)*1000) + ' ms')
  return a
$$ LANGUAGE plpythonu;

This would tell you if there is any non-Python overhead involved. FWIW, for me, the difference between what this printed and what psql on the client printed as the total time was consistently less than 1 millisecond.

Peter Eisentraut
  • 35,221
  • 12
  • 85
  • 90
  • Thanks Peter, interesting! Could you please tell me which version of PostgreSQL, Python, and OS you use? I've run the test again on a brand new Ubuntu 12.04 64bits with PostgreSQL 9.1 and Python 2.7.3 as in the original question, I still have the runtime difference. Maybe this is due to the fact that PostgreSQL 9.1 (released on 2011-09-12) embeds some older version of Python than the Python 2.7.3 I use (released on 2012-08-01). – Franck Dernoncourt May 16 '13 at 14:23
  • Although PostgreSQL 9.1 seems to use Python 2.7: http://packages.debian.org/sid/postgresql-plpython-9.1 (why does the package need dep: libpython2.7 (>= 2.7) whereas PostgreSQL embeds CPython?) – Franck Dernoncourt May 16 '13 at 14:24
  • 1
    I tested with PostgreSQL 9.2 (with Ubuntu 12.10 this time), I still have the runtime difference on my server (although overall 10% faster, probably due to some new version of CPython). I use virtual machines, so maybe this can interfere although I wouldn't understand why. I've also tried using `plpython3u`, it is far slower (88 seconds), but when running the Python script using `python3` it is also slower (75 seconds) and anyway I still have the runtime difference, which is the focus of this thread. – Franck Dernoncourt May 16 '13 at 15:05