1

Possible Duplicate:
Example for boost shared_mutex (multiple reads/one write)?

I am trying to use the shared_lock and unique_lock libraries from boost to implement a basic reader-writer lock on a resource. However, some of the threads accessing the resource have the potential to simply crash. I want to create another process that, given a mutex, monitors the mutex and keep track of what processes locked the resource and how long each process have the lock. The process will also force a process to release its lock if it has the lock for more than a given period of time.

Any suggestions on how to approach this problem is greatly appreciated!

Community
  • 1
  • 1
Jin
  • 6,055
  • 2
  • 39
  • 72
  • 2
    Eliminating the crashes would be the preferred approach. Band-aiding over crashes rarely ends well. – Martin James Nov 09 '12 at 18:46
  • You should wrap the crash prone code segments in `try catch` blocks and release the critical resources there if possible. – didierc Nov 09 '12 at 20:06
  • 2
    try/catch is for programming errors, not for crashes (unless you are talking Windows structured exception handling, which is a different beast). – Peeter Joot Nov 09 '12 at 20:08
  • @PeeterJoot you are totally right: somehow I was assuming that the crashes were uncaught exceptions. If that's the case, it doesn't harm to have some precautions in place, but indeed there's a lot more possible reasons for a thread crashing on its own than just exceptions. – didierc Nov 09 '12 at 20:29
  • Well, server crashing still can happen, and it can't be avoided. The example @DavidTitarenco gave doesn't help since on crashes the system gets SIGSEGV, and no destructors are called, so the locks won't auto-unlock as it's out of scope. – Jin Nov 09 '12 at 23:11

1 Answers1

1

If you force the process holding the lock to release, then you have defeated the purpose of the lock. Imagine mutex pSharedMem->m protects access to some bit of memory pSharedMem->mystuff

pSharedMem->m.get_lock() ;
sleep( LONG_TIME ) ;

// wake up, not knowing that your "deadlock detector"
// has released your mutex

pSharedMem->mystuff++ ; // oh-oh... access to shared memory
                        // without the guarding mutex held.
                        // Who knows what will happen!

pSharedMem->m.release_lock() ; // you may very well trap or hit some
                             // system specific error because
                             // the mutex is no longer held.

(written out explicitly with get_lock() and release_lock() to explicitly highlight the scope of the lock hold).

Peeter Joot
  • 7,848
  • 7
  • 48
  • 82
  • True, but since we are writing the code, we know that each process should not hold the lock for more than a few thousand cycles. So it'd be nice to have a monitor process to make sure that no process holds the lock for too long. – Jin Nov 09 '12 at 23:12
  • Are you running with real time scheduling guarentees? Otherwise, (and perhaps even then) you can't always control the reasons why you are context switched out and not given cpu again after rescheduled. Example: a spike in system activity due to a sudden unregulated workload, that drives the system cpu utilization near capacity, causing very long delays in reschedule times. – Peeter Joot Nov 10 '12 at 00:35
  • Good point, and no I am not running with any kind of scheduling guarantees. Still, is there no way of differentiating if the process holding lock is simply delaying or if it's killed via SIGSEGV? – Jin Nov 12 '12 at 16:43