17

I have a plypython function which does some json magic. For this it obviously imports the json library.

Is the import called on every call to the function? Are there any performance implication I have to be aware of?

vyegorov
  • 21,787
  • 7
  • 59
  • 73
Mauli
  • 16,863
  • 27
  • 87
  • 114

2 Answers2

20

The import is executed on every function call. This is the same behavior you would get if you wrote a normal Python module with the import statement inside a function body as oppposed to at the module level.

Yes, this will affect performance.

You can work around this by caching your imports like this:

CREATE FUNCTION test() RETURNS text
LANGUAGE plpythonu
AS $$
if 'json' in SD:
    json = SD['json']
else:
    import json
    SD['json'] = json

 return json.dumps(...)
$$;

This is admittedly not very pretty, and better ways to do this are being discussed, but they won't happen before PostgreSQL 9.4.

Peter Eisentraut
  • 35,221
  • 12
  • 85
  • 90
  • 2
    Ah, only after your answer I looked up in the documentation that there is a global dictionary SD. Thank you very much. – Mauli Feb 22 '13 at 14:15
  • 1
    @Peter Eisentraut do you know if there have been any improvements to this issue in recent pg versions? – John Bachir Jun 30 '16 at 23:04
  • 1
    There have not been any changes to this. – Peter Eisentraut Jul 01 '16 at 02:00
  • 1
    How about now? Also, where would you find that change? – Shmuel Kamensky Dec 06 '18 at 21:48
  • Is this still the recommended way for handling slow imports in PostgreSQL 11 and above? – Sven Geggus Dec 01 '19 at 11:33
  • Yes, no changes. – Peter Eisentraut Dec 08 '19 at 12:48
  • @PeterEisentraut: Unless PL/Python functions behave differently than Python's, then you may be mistaken. Importing a module for the first time is expensive because of the steps involved between locating the script and building the module object. Once built, though, that object is cached in `sys.modules` and subsequent imports will simply fetch it from there. – Michael Ekoka Mar 15 '20 at 21:19
  • 2
    @PeterEisentraut I think this is wrong. if you run ```do $$ import functools $$ language plpython3u;``` Then ```do $$ import sys if 'functools' in sys.modules: plpy.notice('functools has already been imported') else: plpy.notice('functools has not been imported') $$ language plpython3u;``` The output is 'functools has already been imported' – Shmuel Kamensky Mar 25 '20 at 14:23
5

The declaration in the body of a PL/Python function will eventually become an ordinary Python function and will thus behave as such. When a Python function imports a module for the first time the module is cached in the sys.modules dictionary (https://docs.python.org/3/reference/import.html#the-module-cache). Subsequent imports of the same module will simply bind the import name to the module object found in the dictionary. In a sense, what I'm saying may cast some doubt on the usefulness of the tip given in the accepted answer, since it makes it somewhat redundant, as Python already does a similar caching for you.

To sum things up, I'd say that if you import in the standard way of simply using the import or from [...] import constructs, then you need not worry about repeated imports, in functions or otherwise, Python has got you covered.

On the other hand, Python allows you to bypass its native import semantics and to implement your own (with the __import__() function and importlib module). If this is what you're doing, maybe you should review what's available in the toolbox (https://docs.python.org/3/reference/import.html).

Michael Ekoka
  • 19,050
  • 12
  • 78
  • 79
  • 1
    This is correct. import will not do anything if the function has already been imported in a different function. I ran into this issue when writing a module around plpython. You need to restart the database (which in turn restarts the interpreter) for imports to be re-run. Or like you suggest, mess around with importlib. – Shmuel Kamensky Mar 25 '20 at 14:25
  • Note that the imported modules are cached *per database connection*. So if your backend has multiple connections to the database (e.g., through a connection pool) each database connection has it's own set of cached modules. This hit me because I had a "large" `import` in a `BEFORE INSERT OR UPDATE ON my_table` trigger. This trigger caused a noticeable 3-4 second delay for the first triggering transaction of every database connection. Ouch. I fixed it via a manual preload on connection initialization (move the `import` delay up front). – Frederik Aalund Mar 11 '23 at 10:11
  • This is the answer, not the accepted one. **TLDR:** import all you want, it will always only get executed once per python process. – winwin May 12 '23 at 15:35