2

I have a function written in PostgreSQL, to go over a large table and insert a load of values into a different table. The output is fine, with loads of lines apparently being inserted, but no values are actually inserted into the target table ("resources" table in my code).

I have tried putting the insert statement inside a transaction, to no avail. Is there some sort of fudgy access or permission settings that I am missing? I have found several examples on the web that do this like I am doing, so I am pulling a little hair on this one...

Here is my function:

DECLARE
datatype_property record; 
property record;
new_resource_id bigint;
BEGIN  
    RAISE NOTICE 'Starting...';
    FOR datatype_property IN  
      SELECT * FROM datatype_properties
    LOOP  
        RAISE NOTICE 'Trying to insert';


        if not exists(select * from resources where uri = datatype_property.subject_resource) then
              SELECT INTO new_resource_id NEXTVAL('resources_id_seq');  
              INSERT INTO resources (id, uri) VALUES(  
                    new_resource_id,    
                    datatype_property.subject_resource
              );   
            RAISE NOTICE 'Inserted % with id %',datatype_property.subject_resource, new_resource_id;
        end if;
    END LOOP; 

 FOR property IN  
      SELECT * FROM properties 
 LOOP  

        if not exists(select * from resources where uri = property.source_uri) then
                SELECT INTO new_resource_id NEXTVAL('resources_id_seq');
              INSERT INTO resources (id, uri) VALUES(  
                        new_resource_id,
                        resource.source_uri
              ) ;   
                RAISE NOTICE 'Inserted % with id %',resource.source_uri, new_resource_id;
        end if;
        if not exists(select * from resources where uri = property.destination_uri) then
                SELECT INTO new_resource_id NEXTVAL('resources_id_seq');
              INSERT INTO resources (id, uri) VALUES(  
                        new_resource_id,
                        resource.source_uri
              ) ;   
        RAISE NOTICE 'Inserted % with id %',resource.source_uri, new_resource_id;
        end if;
 END LOOP;  
 RETURN;  

END;

EDIT: I've activated the plpgsql language with the directions from the following link:

http://wiki.postgresql.org/wiki/CREATE_OR_REPLACE_LANGUAGE

EDIT 2:

this code:

DECLARE
datatype_property record; 
property record;
new_resource_id bigint;
BEGIN  

    insert into resources (id, uri) values ('3', 'www.google.com');
END

does not work either :O

João Rocha da Silva
  • 4,259
  • 4
  • 26
  • 35

1 Answers1

1

Your problem does sound like you are not comitting your transaction (as Pavel pointed out) or the tool which you use to check the rows is e.g. using REPEATABLE READ as its isolation level or some kind of caching.

But your function isn't a good solution to begin with. Inserting rows one by one in a loop is alway a bad idea. It will be much slower than doing a single insert (and will be less scalable).

If I'm not mistaken, the two loops can be rewritten into the following statements:

insert into resource (id, uri)
select NEXTVAL('resources_id_seq'),
       dt.subject_resource
from datatype_properties dt
where not exists (select 1
                  from resources r
                  where r.uri = dt.subject_resource);


insert into resources (id, uri)
select nextval('resources_id_seq'),
       p.source_uri
from properties p
where not exists (select 1 
                  from resources r 
                  where r.uri = p.source_uri
                     or r.uri = p.destinatioin_uri);
  • Yeah. Thanks a lot for all your replies guys. When the data was not being inserted, the first thing I tried was a commit, but the only thing I got was syntax errors. In the meanwhile I have discovered that you cannot do an explicit commit in a postgresql function! http://stackoverflow.com/questions/5448984/commit-savepoint-rollback-to-in-postgresql As for the single insert, I am now using a small Java program to make the insertions, using batch Statements and executeBatch(), so it goes along a_horse_with_no_name's reasoning. Thanks, response accepted! – João Rocha da Silva Jul 20 '12 at 16:46
  • The moral of the story is that you should leave stored procedures to DBMSs that really support them in full, e.g Oracle or SQL Server (even though I love open-source solutions)... – João Rocha da Silva Jul 20 '12 at 16:50
  • @JoãoRochadaSilva: PostgreSQL ***does*** fully support stored procedures (or functions that is). Apparently there is something in your environment that you are not telling is. The only difference is, that Postgres requires the *caller* to handle the transaction. Btw: doing multiple inserts via batch will still be slower than a single insert as I have shown. –  Jul 20 '12 at 19:46
  • Thats true, thats what I would do under normal circumstances. However, my batch insert is REALLY big (talk about 140 million rows), and is only run once. A single file containing all the text required for the insert would be unwieldy at best. About the "full support" statement, I was referring to the ability to call explicit commits whenever I feel like it (even tough it may be wrong, but in real systems we all know that good practice is sometimes disregarded in favour of something that *works*). – João Rocha da Silva Jul 21 '12 at 19:57