0

I am writing a little PHP script which is simply return data from a MYSQL table using below query

"SELECT * FROM data where status='0' limit 1";

After reading the data I update the status by getting Id of the particular row using below query

"Update data set status='1' WHERE id=" . $db_field['id'];

Things are working good for a single client. Now i am willing to make this particular page for multiple clients. There are more then 20 clients which will access the same page on almost same time continuously (24/7). Is there a possibility that two or more clients read same data from table? If yes then how to solve it?

Thanks

user934820
  • 1,162
  • 3
  • 17
  • 48
  • Could you clarify if both of those SQL calls are made within a single HTTP request, or is the flow that you select, send to user, they do something, and then you update on their second HTTP request. With no context, I can't speculate. The risk you raise occurs in either scenario, but your solution choices vary depending on either case. – barry-johnson Mar 03 '14 at 04:55
  • yes in a single HTTP request. User did not do anything to data accept reading. – user934820 Mar 03 '14 at 04:57
  • 1
    In that case, you have a couple of generally acceptable options - the easiest one would be if you could adapt your logic to make it a single update command so you could do `update .... where status=0 limit 1` you could also use transactions. I suppose a third option would be to do an initial update which would set a client-specific semaphore of some sort on status, select on that status, do your thing and then update the status again. – barry-johnson Mar 03 '14 at 05:08
  • Semaphore concept laid out in a bit of detail in my answer below. – erik258 Mar 03 '14 at 08:58

3 Answers3

1

You are right to consider concurrency. Unless you have only 1 PHP thread responding to client requests, there's really nothing to stop them each from handing out the same row from data to be processed - in fact, since they will each run the same query, they'll each almost certainly hand out the same row.

The easiest way to solve that problem is locking, as suggested in the accepted answer. That may work if the time the PHP server thread takes to run the SELECT...FOR UPDATE or LOCK TABLE ... UNLOCK TABLES (non-transactional) is minimal, such that other threads can wait while each thread runs this code ( it's still wasteful, as they could be processing some other data row, but more on that later).

There is a better solution, though it requires a schema change. Imagine you have a table such as this:

CREATE TABLE `data` (
  `data_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `data` blob,
  `status` tinyint(1) DEFAULT '0',
  PRIMARY KEY (`data_id`)
) ENGINE=InnoDB;

You don't have any way to transactionally update "the next processed record" because the only field you have to update is status. But imagine your table looks more like this:

CREATE TABLE `data` (
  `data_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `data` blob,
  `status` tinyint(1) DEFAULT '0',
  `processing_id` int(10) unsigned DEFAULT NULL,
  PRIMARY KEY (`data_id`)
) ENGINE=InnoDB;

Then you can write a query something like this to update the "next" column to be processed with your 'processing id':

UPDATE data  
SET processing_id = @unique_processing_id 
WHERE processing_id IS NULL and status = 0 LIMIT 1;

And any SQL engine worth a darn will make sure you don't have 2 distinct processing IDs accounting for the same record to be processed at the same time. Then at your leisure, you can

SELECT * FROM data WHERE processing_id = @unique_processing_id;

and know that you're getting a unique record every time.

This approach also lends it well to durability concerns; you're basically identify the batch processing run per data row, meaning you can account for each batch job whereas before you're potentially only accounting for the data rows.

I would probably implement the @unique_processing_id by adding a second table for this metadata ( the auto-increment key is the real trick to this, but other data processing metadata could be added):

CREATE TABLE `data_processing` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `data_id` int(10) unsigned DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB;

and using that as a source for your unique IDs, you might end up with something like:

INSERT INTO data_processing SET date=NOW();
SET @unique_processing_id = (SELECT LAST_INSERT_ID());
UPDATE data 
SET processing_id = @unique_processing_id 
  WHERE status = 0 LIMIT 1;
UPDATE data 
    JOIN data_processing ON data_processing.id = data.processing_id
  SET data_processing.data_id = data.data_id;
SELECT * from data WHERE processing_id = @unique_processing_id;
-- you are now ready to marshal the data to the client ... and ... 
UPDATE data SET status = 1 
    WHERE status = 0 
    AND processing_id = @unique_processing_id
LIMIT 1;

Thus solving your concurrency problem, and putting you in better shape to audit for durability as well, depending on how you set up data_processing table; you could track thread IDs, processing state, etc. to help verify that the data is really done being processed.

There are other solutions - a message queue might be ideal, allowing you to queue each unprocessed data object's ID to the clients directly ( or through a php script ) and then provide an interface for that data to be retrieved and marked processed separately from the queueing of the "next" data. But as far as "mysql-only" solutions go, the concepts behind what I've shown you here should server you pretty well.

erik258
  • 14,701
  • 2
  • 25
  • 31
0

The answer you seek might be using transactions. I suggest you read the following post and its accepted answer:

PHP + MySQL transactions examples

If not, there is also table locking you should look at:

13.3.5 LOCK TABLES and UNLOCK TABLES

Community
  • 1
  • 1
Jack M.
  • 1,195
  • 13
  • 30
  • Locking system is really the thing I want. I think a particular row can also be locked by "SELECT * FROM data where status='0' FOR UPDATE limit 1"; and then while reading the $db_field = mysql_fetch_assoc($result) I store data in a variable and then update it. If it will work? – user934820 Mar 03 '14 at 05:17
  • 1
    Transactional approach may work (InnoDB required), but since you're not specifying an order at all, it's very likely that each query will try to select the same row - thus basically guaranteeing that you'll be blocking other threads. – erik258 Mar 03 '14 at 05:47
  • We lock tables, not rows. But from what you say you might want transactions, not locking. – Jack M. Mar 03 '14 at 05:48
  • I'm surprised by how often that point is misunderstood. You _can_ lock an entire table, and if you're using eg MyISAM that's your only option, but "real" transactional processing still uses locking. See http://dev.mysql.com/doc/refman/5.0/en/innodb-lock-modes.html - the difference is whether the entire table needs to be locked, or whether the locks can be made on a row-by-row basis. – erik258 Mar 03 '14 at 06:20
0

I will suggest you to use session for this... you can save that id into session... so you can check if one client is checking that record, than you can not allow another client to access it ...

Nishant Solanki
  • 2,119
  • 3
  • 19
  • 32
  • Maybe I'm misunderstanding something, but does this really answer the question? – erik258 Mar 03 '14 at 05:36
  • Thanks for help but, this kind of solution could not be used in my case because all clients are Bots and they assigned a job to read data from a particular page and process it. At the same time it is necessary that data should not be processed multiple times. For that I update the status of row after every data read. – user934820 Mar 03 '14 at 05:36
  • ohh I am really sorry .. may be I misunderstood the question.. please check the `edited answer`... – Nishant Solanki Mar 03 '14 at 05:40