I'm wondering about an update I am making to a large table, and whether I need to worry about locks.
I have a table looking like this:
CREATE TABLE "ItemsToProcess"(
"id" text,
"WorkerInstanceId" text,
"ProcessingStartTime" timestamp with time zone,
"UpdatedTime" timestamp with time zone,
CONSTRAINT "ITP_PK" PRIMARY KEY ("id")
)WITH (
OIDS=FALSE
);
Initially, this table has ~2.0 million rows, and only the id
column filled in - WorkerInstanceId
and the two timestamps are NULL
by default and on the start of the run.
What happens is that some worker apps (at least two, but will be around 10-13 in production) will mark a batch of IDs (I plan to set batchSize to 200) from this table for them to process. What happens during processing doesn't really matter now.
The marking of a batch looks like this:
UPDATE "ItemsToProcess"
SET "WorkerInstanceId" = ?, "ProcessingStartTime" = current_timestamp()
WHERE "WorkerInstanceId" is NULL
LIMIT 200;
My question is, do I need to worry about locking the rows I'm going to update before making the update?
Postgres documentation says:
ROW EXCLUSIVE
Conflicts with the SHARE, SHARE ROW EXCLUSIVE, EXCLUSIVE, and ACCESS EXCLUSIVE lock modes.
The commands UPDATE, DELETE, and INSERT acquire this lock mode on the target table (in addition to ACCESS SHARE locks on any other referenced tables). In general, this lock mode will be acquired by any command that modifies the data in a table.
So I think that whenever one of the workers makes this update, the whole table is locked, 200 rows are updated and the lock is freed up in the end. Until the lock is in place, the other workers are waiting for the lock to free up. Is this right or am I missing something?