2

I have a tasks thread running in two separate instances of tomcat. The Task threads concurrently reads (using select) TASKS table on certain where condition and then does some processing.

Issue is ,sometimes both the threads pick the same task , because of which the task is executed twice. My question is how do i make both thread not to read the same set of data from the TASKS table

Sudhakar
  • 4,823
  • 2
  • 35
  • 42
  • You have to think about synchronizing and isolation. – duffymo Nov 15 '11 at 10:20
  • Synchronizing at the JAVA level is not possible as the thread runs in two separate tomcat machines. Are you referring at DB level ...if so please add more details on what isolation strategy is best suited and how .thankx – Sudhakar Nov 15 '11 at 10:30
  • You have to set the isolation on the database connections to serializable. It'll cost you some performance, but you'll but correctness with it. – duffymo Nov 15 '11 at 13:38
  • Yes that is one option , but since i use hibernate setting the isolation to serializable , it would affect the performance of the entire application .Is there way to set isolation strategy specific to a table – Sudhakar Nov 16 '11 at 06:36

6 Answers6

0

It is just because your code(which is accessing data base)DAO function is not synchronized.Make it synchronized,i think your problem will be solved.

Abhinav
  • 1,041
  • 1
  • 9
  • 11
  • No that is not the issue here, i have clearly mentioned that the threads run on two different tomcats – Sudhakar Nov 15 '11 at 11:33
0

If the TASKS table you mention is a database table then I would use Transaction isolation.

As a suggestion, within a trasaction, set an attribute of the TASK table to some unique identifiable value if not set. Commit the tracaction. If all is OK then the task has be selected by the thread.

I haven't come across this usecase so treat my suggestion with catuion.

Brett Walker
  • 3,566
  • 1
  • 18
  • 36
0

I think you need to see some information how does work with any enterprise job scheduler, for example with Quartz

Vladislav Bauer
  • 952
  • 8
  • 19
0

For your use case there is a better tool for the job - and that's messaging. You are persisting items that need to be worked on, and then attempting to synchronise access between workers. There are a number of issues that you would need to resolve in making this work - in general updating a table and selecting from it should not be mixed (it locks), so storing state there doesn't work; neither would synchronization in your Java code, as that wouldn't survive a server restart.

Using the JMS API with a message broker like ActiveMQ, you would publish a message to a queue. This message would contain the details of the task to be executed. The message broker would persist this somewhere (either in its own message store, or a database). Worker threads would then subscribe to the queue on the message broker, and each message would only be handed off to one of them. This is quite a powerful model, as you can have hundreds of message consumers all acting on tasks so it scales nicely. You can also make this as resilient as it needs to be, so tasks can survive both Tomcat and broker restarts.

Jakub Korab
  • 4,974
  • 2
  • 24
  • 34
0

Whether the database can provide graceful management of this will depend largely on whether it is using strict two-phase locking (S2PL) or multi-version concurrency control (MVCC) techniques to manage concurrency. Under MVCC reads don't block writes, and vice versa, so it is very possible to manage this with relatively simple logic. Under S2PL you would spend too much time blocking for the database to be a good mechanism for managing this, so you would probably want to look at external mechanisms. Of course, an external mechanism can work regardless of the database, it's just not really necessary with MVCC.

Databases using MVCC are PostgreSQL, Oracle, MS SQL Server (in certain configurations), InnoDB (except at the SERIALIZABLE isolation level), and probably many others. (These are the ones I know of off-hand.)

I didn't pick up any clues in the question as to which database product you are using, but if it is PostgreSQL you might want to consider using advisory locks. http://www.postgresql.org/docs/current/interactive/explicit-locking.html#ADVISORY-LOCKS I suspect many of the other products have some similar mechanism.

kgrittn
  • 18,113
  • 3
  • 39
  • 47
-1

I think you need have some variable (column) where you keep last modified date of rows. Your threads can read same set of data with same modified date limitation.

Edit: I did not see "not to read"

In this case you need have another table TaskExecutor (taskId , executorId) , and when some thread runs task you put data to TaskExecutor; and when you start another thread it just checks that task is already executing or not (Select ... from RanTask where taskId = ...). Нou also need to take care of isolation level for transaсtions.

Sergey Gazaryan
  • 1,013
  • 1
  • 9
  • 25