2

I am new to postgres and i'm am having to populate a database using it. I cant submit my actual code for reference but my program is in java and i'm trying to do something along these lines.

IF EXISTS (SELECT 1 FROM TEST WHERE FOO = 2 and BAR = 3) 
 BEGIN
   SELECT TableID FROM TEST WHERE FOO = 2 and BAR = 3
 END
 ELSE
 BEGIN
   INSERT INTO TEST(FOO, BAR) VALUES(2, 3)
 END

I'm trying to do something along these lines because my program is just parsing a bunch of files and populating the database. I need to ensure that if someone goes back and runs the program again on the same files that the database wont contain duplicate entries. The id associated with this insert/record found is used to populate a linking table. I found this example on another stackoverflow questions only it relates to Microsoft SQL Server 2005. I'm using postgres and it fails on the very first line at IF. I understand different database languages operate slightly different. Is there a way to do an IF EXISTS return ID else insert using postgres.

Currently i'm checking if the entry exist and if it doesnt insert. But then i'm having to query the database again to get the id of the record had it existed.

int id = 0;
PrepareStatement ps = conn.prepare(SQL command, statement.generatekeys);  //syntax is not correct for the preparestatement but i'm not looking at my code and dont remember it off top of my head.
ResultSet rs = ps.executeUpdate();
if(rs.next()){
   if(rs.getGeneratedKeys().rows() > 0)  //not sitting at my code but there is a way to check if a key was generated from an insert
     id = rs.getGeneratedKeys()
   }else{
     id = rs.getInt(1);       
   }
}

Update Taken from comments:

The better questions is Is there a way to do an if exist return id else insert without querying/checking a where clause twice.

I want to avoid doing a WHERE check twice to get either my id from existing data or from the new entry. My condition checks are rather lengthy as some of my tables contain upwards of 14 attributes. Currently what i do is query the database once doing an INSERT if doesn't exist. If my getGeneratedKeys().size() = 0 i know the INSERT failed because it exist and then I re-query the database with the same where clause from my original query and grab the id.

Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
Jeremy
  • 935
  • 5
  • 18
  • 33
  • I think we were a bit hasty in flagging this as a dup; it's not quite an upsert, it's a "search and insert if not found". – Craig Ringer Jul 01 '13 at 23:40

1 Answers1

3

This is closely related to an UPSERT or MERGE operation, which isn't supported directly by PostgreSQL. Upsert is rather more complicated in a concurrent environment than you'd expect.

Most of the concurrency problems with upsert also apply to what you want to do. Your current code is entirely wrong in the presence of more than one thread doing updates at a time.

Read depesz's article on the topic then search Stack Overflow for [postgresql] upsert or [postgresql] merge, and this SO post I wrote.

You should be able to fairly easily adapt the PL/PgSQL function given there to do the job. It'll only run correctly in READ COMMITTED isolation, though.

Community
  • 1
  • 1
Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
  • I was looking at the two things you mentioned and it doesnt have an example in postgres of an if exist return id else insert or even point me in a direction that might help explain if its possible in postgres. – Jeremy Jul 01 '13 at 20:14
  • @Jeremy That's a variant on the same problem, with the same issues. You can adapt the PL/PgSQL function given in http://stackoverflow.com/questions/17267417/how-do-i-do-an-upsert-merge-insert-on-duplicate-update-in-postgresql/17267423#17267423 to fit. – Craig Ringer Jul 01 '13 at 23:39
  • If i'm understanding and reading the example correctly it requires me to know the id going into the query of what the insert would be. Each of my if exist return id else insert queries is using a seqence. So I dont know the ID of the insert until after its completed and i get it from rs.getGeneratedKeys(). So i dont see how I can make the example your referencing work. My check to see if it exist is just checking if each of the column fields are the same as what i'm attempting to enter. Assuming it is i would like to return said id. – Jeremy Jul 02 '13 at 16:37
  • 1
    The better questions is "Is there a way to do an if exist return id else insert without querying/checking a where clause twice." IE not doing a where check 2 times to get either my id from existing data or from the new entry. My condition checks are rather lengthy as some of my tables contians upwards of 14 attributes. Currently what i do is query the database once doing an insert if doesnt exist. if my getGeneratedKeys().size() = 0 i know the insert failed because it exist and then i requery the database with the same where clause from my original query grabing the id. – Jeremy Jul 02 '13 at 19:33
  • @Jeremy Ah, that is a *much* clearer explanation of what you really want. – Craig Ringer Jul 02 '13 at 23:10
  • So performance wise is there a more effiecent way of doing this type of insert? My dataset i'm pulling the information from to populate the database is very very large. My fear is that by performing there where check twice i'm slowing down the process significantly. – Jeremy Jul 03 '13 at 14:25
  • I actually found a way to do this not sure if its pretty now the question is performance which is better. The question with my solution to accomplishing this is given in http://stackoverflow.com/questions/17454748/which-is-better-query-database-twice-or-create-a-function-to-query-once – Jeremy Jul 03 '13 at 17:42
  • How do I mark this question as awnsered by referencing another question on stackoverflow that contains the awnser? Simply put I found a solution myself posted it in another question regarding optimization and speeds. But I want to close this question and have people looking for same solution to look at the other posting? The solution is provided in the link above this. – Jeremy Jul 10 '13 at 18:05
  • @Jeremy Feel free to post your own answer that refers to and summarizes that one. Or "flag" your question and ask a moderator to close it as a duplicate of the other one if you feel it's nearly identical. – Craig Ringer Jul 10 '13 at 23:27