How efficient is a try_lock on a mutex?

Question

How efficient is a try_lock on a mutex? I.e. how much assembler instructions are there likely and how much time are they consuming in both possible cases (i.e. the mutex was already locked before or it was free and could be locked).

In case you have problems to answer the question, here is a how to (in case that is really unclear):

If that answer depends a lot on the OS implementation and hardware: Please answer it for common OS`s (e.g. Linux, Windows, MacOSX), recent versions of them (in case they differ a lot from earlier versions) and common hardware (x86, amd64, ppc, arm).

If that also depends on the library: Take pthread as an example.

Please also answer if they really differ at all. And if they differ, please state the differences. I.e. what do they do differently? What common algorithms are there around? Are there different algorithms around or do all common systems (common by the above list if that is unclear) have implemented mutexes just in the same way?

As of this Meta discussion, this really should be a separate question.

Also, I have asked this as a separate question from the performance of a lock because I am not sure if try_lock may behave different. Maybe also depending on the implementation. Then again, please answer it for common implementations. And this very similar/related question obviously shows that this is an interesting question which can be answered.

Was it really necessary to start an entirely new question that is almost identical to the one you asked 2 minutes ago? Especially given that this question suffers from exactly the same problems I identified in your last question. — Gian, Sep 06 '10 at 14:06
@Gian: Well, it is a different question, isn't it? Normally, it helps to keep separate questions separate as far as possible. Also, the answer to this question may be totally independent to the other one. Do you know if that is not the case? I don't know it and I would think it is not. Please answer the question then. :) — Albert, Sep 06 '10 at 14:13
@Gian: Read here: http://meta.stackexchange.com/questions/39223/one-post-with-multiple-questions-or-multiple-posts — Albert, Sep 06 '10 at 14:14
They are the same question with respect to how much detail you have included in each question. It's reasonable to say that in both instances the answer is "it depends on the implementation". — Gian, Sep 06 '10 at 14:49
@Gian: No it is not? `try_lock` is a different thing than `lock`. It might be possible that `try_lock` is more / much more expensive than `lock`. I don't know that. That is why I am asking here. If you know that it is the case that they are in fact the same, please answer this. If you know implementations where they are cheap and others which are expensive, please say that. — Albert, Sep 06 '10 at 21:38
It's not the responsibility of volunteers on this site to guess what on earth you mean in a question. I already explained in the other question: `try_lock` might take 1 cycle or it might take a billion. Without reference to some kind of frame, there is no way to make meaningful comparisons. — Gian, Sep 07 '10 at 07:54
@Gian: An answer could go like this: *On Linux on a x86, it behaves like .... But Windows is much different, so it is like ... Other Unixes behaves mostly like Linux. Other architectures may behave different though. You can look that up in the Linux kernel at ... `try_lock` behaves different than `lock` on most systems because ...* — Albert, Sep 07 '10 at 11:09
Those aren't even reasonable statements to make. Linux 2.6.21 on Intel Core 2 Duo of a particular batch may perform completely differently to anything else. It's not reasonable to expect anyone to enumerate all elements of the cross-product of operating systems, threading implementations and processors. If you were asking what kinds of factors influence efficiency, then that's a very different question to the one you asked. — Gian, Sep 07 '10 at 11:47
@Gian: When I am asking about the efficiency and it is in fact dependent on a bunch of factors, then that list and the explanation how it is dependent is a perfect answer to the question. I don't know if it really is so different. That is why I am asking. So you know that the Core2Duo has a different implementation in the Linux kernel than for example the AMD CPU? I don't know this. I thought it is not and all x86 architectures only have one single implementation. If I would know about the answer and how it can be answered, I would not ask. — Albert, Sep 07 '10 at 16:42
Hello, has anyone read [this](http://meta.stackexchange.com/questions/39223/one-post-with-multiple-questions-or-multiple-posts)? So why is it closed as *exact duplicate* if it is not a duplicate? — Albert, Sep 09 '10 at 15:53
Those who have closed this as exact duplicate: Can you at least answer what I asked here and what I have not asked in the other question, which was already answered and which answer does not answer this question here? — Albert, Sep 09 '10 at 16:03
I believe I made the argument as to why I believed it was an exact duplicate, and it appears that others agreed with me. That meta post talks about independent questions in the same domain. Your questions were identical with respect to how much detail you included and the way you restricted the scope of each (not at all). — Gian, Sep 10 '10 at 19:41
@Gain: Do you fail to see that `try_lock` does something different than `lock`? Why is the question invalid if the performance is different? So you think it is the same thing? If it is, why don't you just answer that? — Albert, Sep 11 '10 at 11:29

score 2 · Answer 1 · answered Sep 08 '10 at 15:10

A mutex is a logical construction that is independent of any implementation. Operations on mutexes therefore are neither efficient nor inefficient - they are simply defined.

Your question is therefore akin to asking "How efficient is a car?", without reference to what kind of car you might be talking about.

I could implement mutexes in the real world with smoke signals, carrier pigeons or a pencil and paper. I could also implement them on a computer. I could implement a mutex with certain operations on a Cray 1, on an Intel Core 2 Duo, or on the 486 in my basement. I could implement them in hardware. I could implement them in software in the operating system kernel, or in userspace, or using some combination of the two. I might simulate mutexes (but not implement them) using lock-free algorithms that are guaranteed conflict-free within a critical section.

EDIT: Your subsequent edits don't help the situation. "In a low level language (like C or whatever)" is mostly irrelevant, because then we're into measuring language implementation performance, and that's a slippery slope at best. "[F]rom pthread or whatever the native system library provides" is similarly unhelpful, because as I said, there are so many ways that one could implement mutexes in different environments that it's not even a useful comparison to make.

This is why your question is unanswerable.

Well than just consider the *current* implementation on Linux, MacOSX and Windows. And please describe that/why that implementation fundamentally are different from earlier/other implementations. Please give some examples in what different ways it is actually implemented in practice. Because I doubt that there are so much fundamentally different implementations around in common (Linux/MacOSX/Win) OS on common (x86,amd64) hardware. — Albert, Sep 09 '10 at 15:52
I'm sorry, my contract rates are available on request. Otherwise you can take my answer or leave it :) I've devoted enough time to this subject matter now, I think. Perhaps someone else will provide an answer with a full comparative analysis that you could either perform yourself or search for research literature. — Gian, Sep 09 '10 at 18:39
You don't need to answer this if you feel that I ask for too much information and your knowledge does not cover enough to be able to answer it. — Albert, Sep 11 '10 at 11:30

How efficient is a try_lock on a mutex?

1 Answers1

Linked