3

I'm not sure the question is easily understandable, so I'll start with a (faulty) code of what I would like to do.

...
{
    Time start = getCurrentTime();
    scoped_lock<MutexType> lock(mutex);
    Time lockWait = getCurrentTime() - start;

    // Protected section
    ...
}

This code does not work because nothing guarantees that the thread will not be preempted somewhere between the start time save instruction and the mutex lock, in which case the computed lockWait would be wrong (overestimation).

I can not think of any solution to get around this issue. I feel that I need to get my hands dirty in multithread low level mechanisms, but I do not know how. Would you have any solution or pointer?

So I'm interested in the wait time, obviously, but I also would like to know if the lock is contested or not, which I'm not sure I can obtain with time measurements only. If not, how could I get infos about that?

PS. I've seen there exists some tools that provide some measurements, such as mutrace or even Walgrind, but unfortunately it is required for my measurements to be integrated to the application source code.

Silverspur
  • 891
  • 1
  • 12
  • 33
  • 1
    This is not an answer, but it is likely that the code will not be preempted between when the lock is acquired and when the timer stops. When you fail to acquire a lock, your thread typically goes to sleep (if it didn't you'd be in trouble on a single-core system). This means that when you get the lock, either you got it on the first try, or your thread is fresh out of sleep and unlikely to get interrupted before reaching getCurrentTime(). There aren't that many instructions between the two. Is there any particular reason you need the exact value all the time ? You could subclass the mutex. – tux3 Jan 09 '15 at 00:42
  • 2
    Do you have a license for Intel VTune? – user14717 Jan 09 '15 at 00:50

2 Answers2

4

Short answer: All timing is estimation, so it's not possible to exactly measure this in the way you want.

Long answer: If what you're trying to do is to measure the speed of acquisition of an uncontested lock, then the accurate way to do this is to just count instruction cycles in a simulator. However, second best is to run the test many thousands of times (so that the effect of preemption is minimised) and then average. (Really, the preemption doesn't matter, since all user-mode code is subject to it. you can't prevent preemption in user-mode, and the timing effect will tend to zero over a large number of samples) That is what almost all timing tools do. You could minimise the likelihood of preemption (and thus improve the estimate) by running on a lightly-loaded multi-CPU system.

If instead you're interested in real program metrics (e.g. also in cases where there is the possibility of the lock being contested) then the same recommendation applies.

bleater
  • 5,098
  • 50
  • 48
  • Ok. But then, if timing can only be done statistically, I can not get from the mean wait time the number of times the lock has been contested. Then how could I get that ? [I've edited the question to reflect that I'm not only interested in the wait time but also in the contested/not contested status.] – Silverspur Jan 09 '15 at 09:15
  • 1
    If you're using boost, you can use a constructor for the lock which invokes try_lock, and examine the result code: `bool locked = boost::mutex::scoped_lock lock(mutex, boost::try_to_lock); if (locked) /* lock was uncontested */; else /* lock was contested */;` – bleater Jan 12 '15 at 02:47
2

I upvoted @bleater's answer for your first question. A minor suggestion is that in some cases, you might be more interested in knowing the minimum (or more intriguingly the minimum above floor) instead of the average. Or it could be the maximum or some percentile. Which one is more meaningful depends on your purpose.

Regarding your second question: finding the number of times the lock is contested, you can use an atomic counter.

The usage will be almost exactly like they way you measure time: increment and save the resulting value before trying to acquire the lock, and decrement upon leaving the scope.

Whenever this value is greater than 1, you have a contention.

Keep in mind that the C++ std::atomic<T> does not mandate its library implementations to use the low-overhead CPU-provided atomic instructions. Therefore you may have to take a look at the generated disassembly to ensure that it is using the low-overhead CPU instructions.

Low-level timing requires a great deal of care for correct results. If you use TSC (RDTSC) for CPU-level time measurement, you might also need to:

  • Disable SpeedStep (Intel EIST)
  • Disable TurboBoost
  • Make sure your CPU and platform supports invariant TSC
  • And finally, filter bad timing results if it turns out a thread has been preempted and then migrated to another CPU core, because TSC readings aren't guaranteed to be coherent between CPU cores even with the greatest amount of care.

(After all, there is a physical distance between the cores, and electrical signals travel slower than light ...)

Finally, the results (be it the contest counts, time spent waiting, or the raw timestamped traces from each thread) need to be stored somehow, and it is a good idea to store in some form of thread-local storage to prevent cache contentions. Again, to ensure correct result, one has to look at the low-level code as well as measuring the overhead to ensure it is doing the right thing.

Community
  • 1
  • 1
rwong
  • 6,062
  • 1
  • 23
  • 51