5

I'm trying to update multiple rows using postgres, I'm using this code:

UPDATE foobar SET column_a = CASE
  WHEN column_b = '123' THEN 1
  WHEN column_b = '345' THEN 2
END;

If I create a new table this works flawlessly, but when running this on a large table with 8 million rows this hangs indefinitely. I've tried first in Admineer (web interface) and also in console.

However this works just fine:

UPDATE foobar SET column_a=1 WHERE column_b='123';

I'm hesitant to implement this approach into my code since I'll have thousands of updates at once and would prefer to put them in one statement. Any ideas as to why the first example would hang postgres and the second would work just fine? I just double checked and I have no rules applied on the table.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
newUserNameHere
  • 17,348
  • 18
  • 49
  • 79
  • 1
    You are really trying to update 8 million rows? Could you please provide the real commands and `EXPLAIN ANALYZE` of them? These ones means almost nothing... – MatheusOl Sep 14 '13 at 03:52
  • Seems to be the follow-up to this question: http://stackoverflow.com/questions/18797608/update-multiple-rows-in-same-query-using-postgresql – Erwin Brandstetter Sep 14 '13 at 16:07

2 Answers2

6

The problem was ..

The statement:

CASE
  WHEN column_b = '123' THEN 1
  WHEN column_b = '345' THEN 2
END;

.. is just short for:

CASE
  WHEN column_b = '123' THEN 1
  WHEN column_b = '345' THEN 2
  ELSE NULL
END

Meaning, without a WHERE clause, your UPDATE statement is not just "trying", it actually updates every single row in the table, most of them to NULL.
Maybe, a NOT NULL constraint on the column prevented data loss ...

The better solution is ..

I'll have thousands of updates at once and would prefer to put them in one statement.

Much faster (and shorter) for large sets:

UPDATE foobar f
SET    column_a = val.a
FROM  (
   VALUES
     (123, 1)
    ,(345, 2)
   ) val(b, a)
WHERE f.column_b = val.b

Joining to a set beats iterating through a long list of CASE branches for every row easily. The difference will grow rapidly with longer lists.

Also, be sure to have an index on column_b either way.

You can replace the VALUES expression with any table, view or subselect yielding appropriate rows.

Note:
I am assuming that column_a and column_b are of type integer. In this case, the single quotes around '123' in your question were never helpful. You'd better use a numeric literal instead of a string literal. (Even though it works with string literals, too.)

A string literal like '123' defaults to unknown type.
A numeric literal like 123 defaults to integer - or bigint / numeric if the number is too big.

If you were dealing with non-default data types, you would have to cast explicitly. Would look like:

...
FROM  (
   VALUES
     ('123'::sometype, '1'::sometype)  -- first row defines row type
    ,('345', '2')
   ) val(b, a)
...
Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • this is related question I think http://stackoverflow.com/questions/18797608/update-multiple-rows-in-same-query-using-postgresql – Roman Pekar Sep 14 '13 at 15:54
  • @ErwinBrandstetter actually I like update ... from solution not only for speed but because it's generalize much better - in case you need two or more columns, you don't have to write additional cases, I've mentioned that in my answer – Roman Pekar Sep 14 '13 at 16:07
  • @RomanPekar - The two are related. The first solution I got on the other question did not include the where statement. I was confused because it was a working solution and was getting upvotes, I assumed I was doing something wrong. It wasn't till after I posted this that that question got corrected by MatheusOl, which then solved this question as well. – newUserNameHere Sep 14 '13 at 20:55
2

I'm keeping this question up in case anyone runs into this issue.

This query was the culprit:

UPDATE foobar SET column_a = CASE
  WHEN column_b = '123' THEN 1
  WHEN column_b = '345' THEN 2
END;

Problem is that it's missing a WHERE statement so it's trying to update all rows. With large databases this can be an issue, in my case it just timed out. As soon as I added the where statement in there it fixed the issue.

Here's the solution:

UPDATE foobar SET column_a = CASE
   WHEN column_b = '123' THEN 1
   WHEN column_b = '345' THEN 2
END
WHERE column_b IN ('123','345')
newUserNameHere
  • 17,348
  • 18
  • 49
  • 79