3

I am looking for advices or pointers on how to design a specific part of a Mac OS X network kernel extension to be as fast and efficient as possible – C language.

Description: I have two sets of TAILQ lists. One for type A structures, the other for type B structures. Mostly, I deal with them separately, thus have a lock_mtx for each one. Sometimes, I need to modify A, then B, then both at the same time. It looks like this :

Function1:
{
Modify List or Struct A
Modify List or Struct B
Modify List or Struct A & B
Modify List or Struct B
Modify List or Struct A
}

Function2:
{
Modify List or Struct B
Modify List or Struct A & B
Modify List or Struct A
Modify List or Struct A & B
}

I am not familiar with the use of locks. So I see only two options: 1. Use a single lock to protect both lists. That would be a waste, since it would prevent function modifying A only to execute while a function modifying B only is executing (and vice versa).

  1. Take both locks successively, and release them. That would give me:

.

   Function1:
    {
    lock A
    Modify List or Struct A
    unlock A

    lock B
    Modify List or Struct B
    unlock B

    lock A
    lock B
    Modify List or Struct A & B
    unlock B
    unlock A

    lock B
    Modify List or Struct B
    unlock B

    lock A
    Modify List or Struct A
    unlock A
    }

I feat this would be quite expensive, taking all those locks. Is there a better way to protect cross-modifications of A and B ?

Thanks for your advices.

Jean
  • 7,623
  • 6
  • 43
  • 58
  • If you don't do any cpu intensive task while holding each lock, how about making the locks function-wide? (lock A and B at the beginning and release them at the end). Also, are you using file locks? Why not mutexes/semaphores? I believe those would be faster – dario_ramos Sep 25 '12 at 12:36
  • Anyway, to be really sure, you should profile your code. – dario_ramos Sep 25 '12 at 12:44
  • 5
    Thanks. I am not very familiar with locks. I use the lck_mtx_t kind of lock, along with lck_mtx_lock(...) and lck_mtx_unlock(…). I could make them function-wide indeed. As for the profiling, the most interesting measurement, I guess, would be the sum of all the time spent, kext-wide, to both wait and acquire locks. How can I measure that ? – Jean Sep 25 '12 at 18:21
  • 2
    Ok, those look like typical kernel level mutexes, so that's OK. I never worked on Mac, but on Linux I use gprof and on Windows, AQTime. Anyway, any decent profiling tool can measure the functions on which the highest % of cpu cycles are consumed. If the functions for acquiring and releasing locks are not at the top of that list, then the bottleneck, if there is one, lies elsewhere. – dario_ramos Sep 25 '12 at 19:38
  • [Instruments](http://stackoverflow.com/questions/11445619/profiling-c-on-mac-os-x) appears to be the profiler of choice in the Mac ecosystem. – dario_ramos Sep 25 '12 at 19:40
  • 3
    I'll echo @dario_ramos advice here: unless you're spending substantial amount of time between each locked block, or you're making calls into external calls, you're likely best off just locking both data structures for the whole function. Another thing to watch out for is if you lock multiple mutexes, you must *always* lock them in the same order, e.g. first A then B. Mixing A-B and B-A will leave you open to to deadlocks in most cases. – pmdj Sep 26 '12 at 09:06
  • 1
    Thank you all for your replies. dario_ramos: Yes, I decided to use lck_mtx_t subsequently to reading a broad documentation on locking mechanisms. All: But I have never had the chance to use them in such a timing sensitive context before, and I wouldn't want my kext to be the bottleneck of the kernel's network stack. So advices such as "you're better off locking a full function rather taking locks several times" are really helpful. pmjordan: Actually, I do place extern calls to bsd functions. It is more like: mod. A, various calls, mod. B, etc… And I can't change the order of the calls. – Jean Sep 26 '12 at 10:15
  • @Jean: in that case, it sounds as if what you've got is already pretty good. I would avoid calling out of your code while holding a lock if at all possible. If acquiring/relinquishing locks is costing you a lot of time compared to time actually spent on code in the lock, consider using a spinlock instead. If contention is your problem, your data structure is probably not optimal. – pmdj Sep 26 '12 at 10:35

1 Answers1

1

Initially, I was looking for design advices in the context of the above-asked question.

Well, it turns out that thought using kernel level mutex seems to be the right call, there are no clear-cut answer. Too much depends on the exact structure of the functions and on the time spent doing side-work that does not require to be carried out in locked context.

In such situations, the best answer, hinted by dario_ramos and pmjordan, would be to profile viable options and make a choice.

Thank you all for your help.

Jean
  • 7,623
  • 6
  • 43
  • 58
  • 5
    And my guess is that profiling a kernel extension is not exactly walk in a park. So I could first begin by sum up all the nano time actually spent acquiring the lock and dead load the network stack from userspace in a reproducible manner. The configuration that statistically minimize the waiting time wins. – Jean Sep 26 '12 at 10:36