3

I have around 10 workers that performs a job that includes the following:

user = User.find_or_initialize_by(email: 'some-email@address.com')

if user.new_record?
# ... some code here that does something taking around 5 seconds or so
elsif user.persisted?
# ... some code here that does something taking around 5 seconds or so
end

user.save

The problem is that at certain times, two or more workers run this code at the exact time, and thus I later found out that two or more Users have the same email, in which I should always end up only unique emails.

It is not possible for my situation to create DB Unique Indexes for email as unique emails are conditional -- some Users should have unique email, some do not.

It is noteworthy to mention that my User model has uniqueness validations, but it still doesn't help me because, between .find_or_initialize_by and .save, there is a code that is dependent if the user object is already created or not.

I tried Pessimistic and Optimistic locking, but it didn't help me, or maybe I just didn't implement it properly... should you have some suggestions regarding this.

The solution I can only think of is to lock the other threads (Sidekiq jobs) whenever these lines of codes get executed, but I am not too sure how to implement this nor do I know if this is even a suggestable approach.

I would appreciate any help.

EDIT

In my specific case, it is gonna be hard to put email parameter in the job, as this job is a little more complex than what was just said above. The job is actually an export script in which a section of the job is the code above. I don't think it's also possible to separate the functionality above into another separate worker... as the whole job flow should be serial and that no parts should be processed parallely / asynchronously. This job is just one of the jobs that are managed by another job, in which ultimately is managed by the master job.

Jay-Ar Polidario
  • 6,463
  • 14
  • 28

3 Answers3

2

Pessimistic locking is what you want but only works on a record that exists - you can't use it with new_record? because there's nothing to lock in the DB yet.

Mike Perham
  • 21,300
  • 6
  • 59
  • 61
  • Yeah, that was my problem earlier as well when I was trying Pessimistic locking -- the record still wasn't there. Well, this is still quite unfamiliar to me, so I just thought that maybe I did something wrong in how I approached it. But I guess, I'll go try with pessimistic locking again and maybe find a thread locking hack along with it. Thanks – Jay-Ar Polidario Feb 12 '15 at 19:20
1

I managed to solve my problem with the following:

I found out that I can actually add a where clause in Rails DB Uniqueness Partial Index, and thus I can now set up uniqueness conditions for different types of Users on the database-level in which other concurrent jobs will now raise an ActiveRecord::RecordNotUnique error if already created.

The only problem now then is the code in between .find_or_initialize_by and .save, since those are time-dependent on the User objects in which always only one concurrent job should always get a .new_record? == true, and other concurrent jobs should then trigger the .persisted? == true as one job would always be first to create it, but... all of these doesn't work yet because it is only at the line .save where the db uniqueness index validation gets called. Therefore, I managed to solve this problem by putting .save before those conditions, and at the same time I added a rescue block for .save which then adds another job to the queue of itself should it trigger the ActiveRecord::RecordNotUnique error, to make sure that async jobs won't get conflicts. The code now looks like below.

user = User.find_or_initialize_by(email: 'some-email@address.com')

begin
  user.save
  is_new_record = user.new_record?
  is_persisted = user.persisted?

rescue ActiveRecord::RecordNotUnique => exception
  MyJob.perform_later(params_hash)
end

if is_new_record
  # do something if not yet created
elsif is_persisted
  # do something if already created
end
Jay-Ar Polidario
  • 6,463
  • 14
  • 28
1

I would suggest a different architecture to bypass the problem.

How about a producer-worker model, where one master Sidekiq process gets a list of email addresses, and then spawns a worker Sidekiq process for each email? Sidekiq makes this easy with a dedicated queue for master and workers to communicate.

Doing so, the email address becomes an input parameter of workers, so we know by construction that workers will not stump on each other data.

Eric Platon
  • 9,819
  • 6
  • 41
  • 48
  • 1
    There was another answer here, but he deleted it. I mentioned there that in my specific case, it is gonna be hard to put email parameter in the job, as this job is a little more complex than what was just said above. The job is actually an export script in which a section of the job is the code above. I don't think it's also possible to separate the functionality above into another separate worker... as the whole job flow should be serial and that no parts should be processed parallely / asynchronously. I do appreciate your gesture though. Thanks. – Jay-Ar Polidario Feb 16 '15 at 13:23
  • I see. How do you think of making your question a bit narrower, then? Adding the detail of your comment would make the whole thread more useful to people who face similar issues. I do not know about your exact situation, but (in general) it may be worth trying to break down the task into short pieces (recommended by Mike Perham---creator of Sidekiq). Overall it seems you need transactional processing. That is a bit more work, but a master process could manage a transaction for its workers, too... – Eric Platon Feb 16 '15 at 13:36
  • 1
    I initially thought of simplifying my question, but I guess you're right, I will update my question to include the specifics. Yes, we're currently implementing transactional processing, and have a lot of independent jobs actually (one of which includes this one). However, I think I am beginning to now understand what you meant of a master process specifically when you mentioned 'manage a transaction for its workers'. I will discuss the prospect to my colleagues. Thanks again. – Jay-Ar Polidario Feb 16 '15 at 15:57