0

The test below shows that it is possible, but I am not verified performance (comparing for example with Bash or Python), and I don't understand exactly how to do or what happens:

  • The lo_import/lo_get (in my test_lo_proc) created a copy of the file, or can it be used as (I supposed) an external reference?
  • Is it necessary to do lo_unlink?
  • The alternative, pg_read_binary_file as commented by @a_horse_with_no_name, is not limited in file size?
CREATE or replace FUNCTION test_lo_proc(
  file_name text
) RETURNS text AS $f$
 DECLARE
  doc_oid oid;
  md5 text;
 BEGIN
  doc_oid = lo_import(file_name);
  md5 = md5(lo_get(doc_oid));
  -- ? lo_unlink(doc_oid)
  RETURN md5;
 END;
$f$ LANGUAGE PLpgSQL;

SELECT test_lo_proc('/tmp/bigbigfile.zip');

PS: the MD5 is to compare performance (with md5sum *.zip for example), real-life function use for example MD5(lo_content||metadata::bytea) or something more complex using internal database resources... I am not looking for external execution (when you can use COPY t FROM PROGRAM).

Peter Krauss
  • 13,174
  • 24
  • 167
  • 304
  • It seems like you have found a way. So what exactly is your question? – Laurenz Albe May 03 '21 at 10:54
  • Using a large object seems overkill just to calculate the MD5 hashsum. Why not use [pg_read_binary_file](https://www.postgresql.org/docs/current/functions-admin.html#FUNCTIONS-ADMIN-GENFILE) –  May 03 '21 at 10:57
  • Hi @LaurenzAlbe, sorry my English... The main question is "I do correct? I am doing with **no copy**?", and the secondary question is "I need lo_unlink?" – Peter Krauss May 03 '21 at 10:57
  • Yes, you need to unlink. I'll look at the rest later. – Laurenz Albe May 03 '21 at 10:58
  • Hi @a_horse_with_no_name, thanks, good clue and alternative. .. I was avoiding pg_read functions because seems there is a restriction for big files, as in `pg_read_file` limited to 922337203 bytes. – Peter Krauss May 03 '21 at 11:21

1 Answers1

0

I can think of two better alternatives:

  1. Like the horse suggested, use pg_read_binary_file to read the file into memory and calculate a checksum with the md5 function.

    The disadvantage is that you have to read the whole file into memory.

  2. Write a PL/PerlU or PL/Python function that calculates the checksum of the file. That will probably perform better. Here is a Stackoverflow question that deals with that.

Laurenz Albe
  • 209,280
  • 17
  • 206
  • 263