Lock free linked list is performing worse than the locked counterpart

Question

I am trying to compare the performance of locked and lock free linked list data structure. I have implemented this algorithm for lock free linked list. Both the programs are implemented in C.

I am testing for 4 threads. Each thread has 1000 insert operations.

I am using Intel PCM tool to measure the performance.

Here are my findings: Lock free:

Median of L3 Misses=1012699
Median of L2 Misses=1479741
Median of L3 Hits=484128
Median of L2 Hits=1797537
Median of Time=1.80696
Median of Cycles=5296042019
Median of IPC=1.536135
Median of Bytes Read=444423232
Median of Bytes Written=25414144

Locked:

Median of L3 Misses=711796.5
Median of L2 Misses=1517899
Median of L3 Hits=819408.5
Median of L2 Hits=2282527
Median of Time=0.244517
Median of Cycles=894265192
Median of IPC=0.8495695
Median of Bytes Read=174872576
Median of Bytes Written=24722912

The locked version performs better on every count except IPC. Is this what is supposed to happen? Or the lock Free data structure should perform better?

If yes, then what is the benefit of using lock free data structures? Any help will be appreciated.

Go to this page:https://stackoverflow.com/questions/33083270 On that page is a youtube video link to a talk given at CppCon about lockless implementations and tradeoffs. The video was split into two parts and the link is for part 2. So, find part 1. The video, in total, is about 2 hours, but the speaker is quite entertaining and really knows his stuff — Craig Estey, Jul 12 '16 at 20:23
This [answer](http://stackoverflow.com/a/4044614/5781248) might explain the performance issue (contention). Lock free data structures probably help to avoid expensive context switches. — J.J. Hakala, Jul 12 '16 at 20:38
You are not seeing the real problem. The cache hit rate is absolutely dreadful, a standard problem with traditional linked lists on a modern processor. A very important detail that these old papers ignore. Don't use them. — Hans Passant, Jul 12 '16 at 21:57

John Bollinger · Accepted Answer · 2016-07-12T21:44:40.687

2

The locked version performs better on every count except IPC. Is this what is supposed to happen? Or the lock Free data structure should perform better?

Generally speaking, which will perform better is a function of both workload details and implementation details. The paper you reference says

Lock-free data structures also have the potential to have better performance

(emphasis added), but it does not promise better performance in every case. Although multiple threads may be able to modify the lock-free data structure at the same time, each modification involves more operations even when there are no conflicts. When there are conflicts, performance degrades.

I observe also that your lock-free code has a higher proportion of cache misses than does your locked code. Although I cannot confidently explain that, I can think of at least two plausible reasons why that would be an expected consequence of the lock-free implementation. Naturally, less effective cache usage significantly degrades performance.

If yes, then what is the benefit of using lock free data structures?

The paper says that the primary benefit is:

If an implementation is lock-free, delays or failures of individual processes do not block the progress of other processes in the system.

edited Jul 12 '16 at 21:44

answered Jul 12 '16 at 20:56

John Bollinger

160,171
8
81
157

Thanks for the answer. Any help on how to reduce the cache misses? – Sumant Jul 13 '16 at 21:48
@Sumant, cache hit rate is mostly a function of data access patterns, which themselves are a function of workload and implementation details, just like overall performance is. Where possible, you would prefer to scan linearly through memory and to avoid jumping around. One way you might facilitate that is by allocating list nodes in medium-sized groups instead of one at a time. Laying out your structure members to minimize padding may help a bit, too. – John Bollinger Jul 13 '16 at 21:55
Overall, however, you are very unlikely to make up the observed performance difference between the two approaches with such tricks. If performance is your governing priority, then test each approach with a workload that simulates the expected one, and choose the one that performs better. – John Bollinger Jul 13 '16 at 21:58
I am doing as part of my Grad project. My aim is to squeeze most performance out of lock free linked list. I am allocating the space of a node on every insert operation. Any tips on how I can optimize the code? Let me know if the the link to the code. – Sumant Jul 13 '16 at 22:19
@Sumant, if this is a graduate project, then it would be appropriate to discuss optimization issues with your graduate advisor. Additionally, if your code works, as seems to be the case, then you could consider posting it for review over at [CodeReview](http://codereview.stackexchange.com/). If you opt for that, however, then do be careful to read and follow their rules, which differ a bit from ours here at SO. – John Bollinger Jul 13 '16 at 23:15

Lock free linked list is performing worse than the locked counterpart

1 Answers1