1

I would like to know if the erasure of elements from list is considered to be "write" operation, i.e. is the erasure of elements in multiple threads thread-safe?

For example, I have a list with more than a 100k elements and, in order to speed up erasing elements from it based on some condition, I would like to split it into as many parts as there are available threads. Then, each thread would check its part and erase specific elements that satisfy some condition. Is this safe to do?

Here is my simple example (note: this is simplified case, I am aware of some edge cases):

#include <list>
#include <vector>
#include <thread>
#include <iostream>
#include <algorithm>

int main() {

    constexpr size_t number_of_threads = 2;

    std::list<unsigned int> elements = { 1, 2, 3, 4, 4, 5, 6, 7};
    std::vector<std::thread> threads;

    size_t elements_per_thread = elements.size() / number_of_threads;

    for (size_t i = 0; i < number_of_threads; i++) {
        auto elements_begin = std::next(std::begin(elements), i * elements_per_thread);
        auto elements_end   = std::next(elements_begin, elements_per_thread);

        threads.emplace_back(
            [&elements, elements_begin, elements_end]() {
                elements.erase(std::remove_if(elements_begin, elements_end, [](unsigned int const& x) {
                    return x == 4;
                }), elements_end);
            }
        );
    }

    for (auto& thread : threads) {
        thread.join();
    }

    for (auto const& item : elements) {
        std::cout << item << " " << std::endl;
    }

    return 0;
}

This would output correct result:

1
2
3
5
6
7

Thank you in advance

NutCracker
  • 11,485
  • 4
  • 44
  • 68
  • 2
    I would assume no. Since `std::list` is a doubly linked list there may be a situation where the pointers back and forward are modified by two threads – Sami Kuhmonen Jun 27 '19 at 07:51
  • 1
    Yes, I agree with you. That's why I, intentionally, created list with two adjacent elements that satisfy condition, last one in the first thread and the first one in the second thread. – NutCracker Jun 27 '19 at 07:55
  • 2
    There is no explicit thread-safety in Standard C++ containers. – Robert Andrzejuk Jun 27 '19 at 08:01
  • @RobertAndrzejuk, there is: https://en.cppreference.com/w/cpp/container#Thread_safety – Evg Jun 27 '19 at 08:09
  • @George that is basically what I have done here, right? – NutCracker Jun 27 '19 at 08:13
  • @NutCracker are we sure, this way, we can see, from tread aspect, only states before or after erasing elemnts from structure. Considering we need to arrange pointers after deletion, on larger number of elements, not all of can be placed in cache line, so we need to enforce atomicity some way. There is where we need to use sync primitives I think. – Boki Jun 27 '19 at 08:28
  • I don't think `erase()` is needed here and used correctly. Simple `elements.remove_if()` should be enough. – sklott Jun 27 '19 at 08:29
  • I am using C++11 so `remove_if` member function is still not defined there. – NutCracker Jun 27 '19 at 08:34
  • According to this table https://en.cppreference.com/w/cpp/container#Member_function_table it is defined in C++03 – sklott Jun 27 '19 at 08:37
  • Oh, yes, you are right. Sorry – NutCracker Jun 27 '19 at 08:45
  • You are passing exactly one argument to [`std::list::erase`](https://en.cppreference.com/w/cpp/container/list/erase). Is this a typo? Probably you also want to pass `elements_end` as a second argument, right? – Julius Jun 27 '19 at 14:41
  • @Julius Actually, `std::list::erase` also accepts only one argument, i.e. iterator to the element to be removed. – NutCracker Jun 27 '19 at 15:59
  • @NutCracker: Yes, that is correct: That single-argument overload removes exactly one element. That is bad if `std::remove_if` removes zero elements or more than one element. – Julius Jun 27 '19 at 16:14
  • Yes, you are right. That would lead to undefined behavior. – NutCracker Jun 27 '19 at 16:21

2 Answers2

0

For example, I have a list with more than a 100k elements and, in order to speed up erasing elements from it based on some condition, I would like to split it into as many parts as there are available threads. Then, each thread would check its part and erase specific elements that satisfy some condition. Is this safe to do?

After reading the following thread-safety note, I am convinced that concurrent calls to std::list::erase are unsafe:

  1. [...] Container operations that invalidate any iterators modify the container and cannot be executed concurrently with any operations on existing iterators even if those iterators are not invalidated.

For completeness, here is what cppreference.com says about reference/iterator invalidation by std::list::erase:

References and iterators to the erased elements are invalidated. Other references and iterators are not affected.

Have you considered to splice the huge list into one small list per thread? Then each thread could use remove_if on its own list before you synchronize and un-splice.

Anyway, a std::list with 100k elements sounds like a way to purposely decrease the performance. Are you performing an experiment, or what is the reason for using std::list here?

Julius
  • 1,816
  • 10
  • 14
  • IMO splicing the list and then merging all small lists together would just lower down the performance. I am working with existing code but, to conclude, maybe the best thing would be to replace `std::list` with a more appropriate container. – NutCracker Jun 27 '19 at 16:04
  • @NutCracker: You are right about the `splice` overhead. I did not expect that the `splice` overload with a range has linear complexity. The [reason for linear complexity of that `splice` overload](https://stackoverflow.com/a/10134870/2615118) is interesting. – Julius Jun 27 '19 at 16:22
0

Erasing an element from a list is indeed a "write" operation.

Some next/previous pointers must be changed and one of the nodes will be deallocated. For example, take the list A <-> B <-> C

The code to remove B look like this or equivalent:

A->next = C
C->prev = A
delete B->data
delete B

These are write operations and not thread-safe by default. Even if the ranges to be erased are different, a race can occur at the range boundaries.

Standard containers are not thread-safe. (This goes for most programming languages). Thread synchronization is expensive and this cost would also impact non-multithreaded code. You shouldn't have to pay for something you're not using. Additionally, since multithreading is an optimization, it's difficult for the data structure designer to know how to optimize without knowing the access pattern. (Although in your case, this is a general access pattern).

If you have C++17, try the std::remove_if function ExecutionPolicy overload. https://en.cppreference.com/w/cpp/algorithm/remove. This should be available in GCC 9 (link with -ltbb) and MSVC 19.14 (VS 2017 15.7) according to https://en.cppreference.com/w/cpp/compiler_support. MSVC actually does parallelize the function (https://devblogs.microsoft.com/cppblog/using-c17-parallel-algorithms-for-better-performance/). I believe GCC does too. Regarding execution policy (https://en.cppreference.com/w/cpp/algorithm/execution_policy_tag_t), last time I checked, the difference between sequenced/unsequenced wasn't implemented in MSVC.

I can see from your example, you already know how to use std::remove_if:

Removing is done by shifting (by means of move assignment) the elements in the range in such a way that the elements that are not to be removed appear in the beginning of the range. Relative order of the elements that remain is preserved and the physical size of the container is unchanged. Iterators pointing to an element between the new logical end and the physical end of the range are still dereferenceable, but the elements themselves have unspecified values (as per MoveAssignable post-condition). A call to remove is typically followed by a call to a container's erase method, which erases the unspecified values and reduces the physical size of the container to match its new logical size.

subsequent example:

std::string str1 = "Text with some   spaces";
str1.erase(std::remove(str1.begin(), str1.end(), ' '), str1.end());

https://en.wikipedia.org/wiki/Erase%E2%80%93remove_idiom

Lastly, you mentioned in some comments that you were considering getting rid of std::list. Bjarne Stroustrup advocates for using std::vector by default. This is because the array can outperform a linked list even in situations that require O(N) shift operations on the array! ("Can"... you should see for yourself which one is faster)

https://isocpp.org/blog/2014/06/stroustrup-lists

https://www.youtube.com/watch?v=YQs6IC-vgmo

If you don't have C++17, switching to an std::vector would also make it easier to parallelize the erasing, since the array backing the vector doesn't have any moving parts. Caveats:

  • This of course depends on how you do it, as multiple threads moving things around will create races.
  • There might be workload imbalance depending on how you split the array.
  • You also have to do a calculation to avoid so called "false sharing"
Humphrey Winnebago
  • 1,512
  • 8
  • 15