286

I have a bunch of rows that I need to insert into table, but these inserts are always done in batches. So I want to check if a single row from the batch exists in the table because then I know they all were inserted.

So its not a primary key check, but shouldn't matter too much. I would like to only check single row so count(*) probably isn't good, so its something like exists I guess.

But since I'm fairly new to PostgreSQL I'd rather ask people who know.

My batch contains rows with following structure:

userid | rightid | remaining_count

So if table contains any rows with provided userid it means they all are present there.

Jeff Widman
  • 22,014
  • 12
  • 72
  • 88
Valentin Kuzub
  • 11,703
  • 7
  • 56
  • 93
  • You want to see if the table has ANY rows, or any rows from your batch? – JNK Sep 19 '11 at 13:24
  • any rows from my batch yes. they all share same field ill edit a little. – Valentin Kuzub Sep 19 '11 at 13:25
  • Please clarify your question. You want to add a batch of records, all or nothing? Is there something special about count ? (BTW a reserved word, impractical as a column name) – wildplasser Sep 19 '11 at 14:08
  • okay, I was trying to simplify actual situation a little but we are getting closer and closer to real implementation. Once those rows are inserted (theres another field for_date) I begin decrementing rights for specified user as they use specific rights, once rights become 0 they cannot perform those actions anymore for that date. thats the real story – Valentin Kuzub Sep 19 '11 at 14:11
  • 1
    Just show (the relevant part of) the table definitions, and tell what you intend to do. – wildplasser Sep 19 '11 at 14:15
  • [userid, rightid, for_date, remainingCount] thats the real table. I intend to insert into this table when user attempts to use any right. I could have created a speciall process that is responsible for inserting rows daily but currently I am checking whether another approach will suit me performance wise. – Valentin Kuzub Sep 19 '11 at 14:20
  • remainingCount with a capital "C"? I'd advice to stick to lower case (remaining_count), or you'll always have to quote this identifier: "remainingCount". Also there is no such thing as "postgre". It's "PostgreSQL" or "postgres". And lastly, it's still unclear whether all rows to be inserted at a time share the *same* userid. – Erwin Brandstetter Sep 19 '11 at 19:41
  • oh well. I copy pasted that from parameter list in C# code don't worry Erwin. I thought it should be clear, but to specify that more, batch contains rows which do share userid, thats why, checking that single row with specified userid makes sense first of all.. – Valentin Kuzub Sep 19 '11 at 19:56

7 Answers7

556

Use the EXISTS key word for TRUE / FALSE return:

select exists(select 1 from contact where id=12)
jdhao
  • 24,001
  • 18
  • 134
  • 273
Michael M
  • 8,185
  • 2
  • 35
  • 51
  • 39
    Extension on this, you can name the returned column for easy reference. Eg `select exists(select 1 from contact where id=12) AS "exists"` – Rowan Sep 15 '13 at 23:16
  • 4
    This is better, because it will always return a value (true or false) instead of sometimes None (depending on your programing language) which might not expand the way you expect. – isaaclw Jun 10 '14 at 23:18
  • 2
    I have Seq Scan with using this method. I do something wrong? – FiftiN Jul 06 '16 at 07:42
  • 1
    @FiftiN :: This might answer your question: http://stackoverflow.com/questions/5203755/why-does-postgresql-perform-sequential-scan-on-indexed-column – Michael M Jul 06 '16 at 17:14
  • 3
    @Michael.M I have DB table with 30 millions rows and when I use ```exists``` or ```limit 1``` I have strong performance drop because Postgres uses Seq Scan instead of Index Scan. And ```analyze``` doesn't help. – FiftiN Jul 07 '16 at 10:41
  • 2
    Would `limit 1` in subquery here help or slowen the query? – maciek Sep 26 '18 at 07:27
  • 2
    @maciek please understand that ‘id’ is a primary key, so “LIMIT 1” would be pointless since there is only one record with that id – Michael M Sep 26 '18 at 18:45
  • Is this better than SELECT id FROM table WHERE something=%s – CodeGuru Nov 27 '19 at 23:31
  • 1
    @CodeGuru, Depends. A SERIAL is 4 times bigger than a BYTE. For every 1MM positives you will marshal 3MB more data than just using `exists`, and you just end up throwing that data away. 7MB more for BIGSERIAL. Also, it may be more efficient, depending on your calling code, to just handle the bool in the response rather than evaluating if there even is a record or not. Without knowing specifics of the OP's case, its impossible to tell. In any case, the code is easier to read if we know we always get a bool. – Michael M Nov 29 '19 at 23:24
  • @MikeM Thank you for explaining and yes I tested and went with your recommendation. – CodeGuru Nov 30 '19 at 04:04
  • 1
    @maciek It helps _if_ you’re not comparing a `UNIQUE` (including `PRIMARY KEY`) column. Otherwise, pointless. – Константин Ван Apr 27 '21 at 06:55
47

How about simply:

select 1 from tbl where userid = 123 limit 1;

where 123 is the userid of the batch that you're about to insert.

The above query will return either an empty set or a single row, depending on whether there are records with the given userid.

If this turns out to be too slow, you could look into creating an index on tbl.userid.

if even a single row from batch exists in table, in that case I don't have to insert my rows because I know for sure they all were inserted.

For this to remain true even if your program gets interrupted mid-batch, I'd recommend that you make sure you manage database transactions appropriately (i.e. that the entire batch gets inserted within a single transaction).

NPE
  • 486,780
  • 108
  • 951
  • 1,012
  • 13
    It might be sometimes be programatically easier to "select count(\*) from (select 1 ... limit 1)" as it's guaranteed to always return a row with a value of count(\*) of 0 or 1. – David Aldridge May 09 '13 at 18:33
  • @DavidAldridge count(*) still means that all the rows have to be read, whereas limit 1 stops at the first record and returns – Imraan Dec 12 '13 at 13:36
  • 6
    @Imraan I think you've misinterpreted the query. The `COUNT` acts on a nested `SELECT` that has at most 1 row (because the `LIMIT` is in the subquery). – jpmc26 Jan 17 '14 at 11:37
11
INSERT INTO target( userid, rightid, count )
  SELECT userid, rightid, count 
  FROM batch
  WHERE NOT EXISTS (
    SELECT * FROM target t2, batch b2
    WHERE t2.userid = b2.userid
    -- ... other keyfields ...
    )       
    ;

BTW: if you want the whole batch to fail in case of a duplicate, then (given a primary key constraint)

INSERT INTO target( userid, rightid, count )
SELECT userid, rightid, count 
FROM batch
    ;

will do exactly what you want: either it succeeds, or it fails.

wildplasser
  • 43,142
  • 8
  • 66
  • 109
  • This will check each row. He wants to do a single check. – JNK Sep 19 '11 at 13:42
  • 1
    No, it does a single check. The subquery is uncorrelated. It will bail out once one matching pair is found. – wildplasser Sep 19 '11 at 13:45
  • Right you are, I thought it referred to the outer query. +1 to you – JNK Sep 19 '11 at 13:47
  • BTW: since the query is inside a transaction, nothing will happen if a duplicate id were to be inserted, hence the subquery can be omitted. – wildplasser Sep 19 '11 at 13:56
  • hmm I am not sure I understand. After rights are inserted, I begin to decrement count column. (just some details for picture) If rows already exist and subquery is omitted I think ill get errors with duplicate unique key thrown or? (userid&right form that unique key) – Valentin Kuzub Sep 19 '11 at 14:00
5
select true from tablename where condition limit 1;

I believe that this is the query that postgres uses for checking foreign keys.

In your case, you could do this in one go too:

insert into yourtable select $userid, $rightid, $count where not (select true from yourtable where userid = $userid limit 1);
Royce
  • 532
  • 3
  • 11
4

as @MikeM pointed out.

select exists(select 1 from contact where id=12)

with index on contact, it can usually reduce time cost to 1 ms.

CREATE INDEX index_contact on contact(id);
hcnak
  • 428
  • 7
  • 18
  • 3
    Cost of 1ms is huge - can only do 1000 such checks per second. Something around 10M checks per second should be aimed. – Meglio Nov 04 '21 at 03:56
4
SELECT 1 FROM user_right where userid = ? LIMIT 1

If your resultset contains a row then you do not have to insert. Otherwise insert your records.

Fabian Barney
  • 14,219
  • 5
  • 40
  • 60
4

If you think about the performace ,may be you can use "PERFORM" in a function just like this:

 PERFORM 1 FROM skytf.test_2 WHERE id=i LIMIT 1;
  IF FOUND THEN
      RAISE NOTICE ' found record id=%', i;  
  ELSE
      RAISE NOTICE ' not found record id=%', i;  
 END IF;
aleroot
  • 71,077
  • 30
  • 176
  • 213
francs
  • 8,511
  • 7
  • 39
  • 43