I only have experience with the likes of concurrent_vector
in Intel's Thread Building Blocks and Microsoft's Parallel Patterns Library, but I would think they are probably comparable with Java's Vector
.
From my tests, concurrent_vector
in both is quite a bit slower than most alternatives like executing multiple threads and gathering the results in local, thread-unsafe containers (like std::vector
in C++), and then appending the results to a shared collection inside a lock, or creating a list of lists with each thread writing to its own list (using some array index) and then, in a serial fashion, combine the results into a single list at the end.
The benefit of the thread-safe version is convenience as far as I can tell. I have never found a concurrent, thread-safe random-access sequence even from Intel's own library where I can't beat it with thread-unsafe alternatives where I accumulate results locally in one thread and then use some basic thread synchronization and combine the results in a serial fashion in one thread. If my tests are write-heavy, then often I can get 2-3 times faster results with this crude method over the concurrent container.
That said, it might be rare in many cases where your times are actually skewed towards writing/appending elements to a container. Far more often I find the bulk of my real-world thread cases are spent reading and crunching numbers and the like and only a small portion of the time is spent pushing the results to the back of a container. So very often the overhead of a concurrent container starts to become negligible, and it's certainly a whole lot more convenient and far less prone to misuse across threads.
In the probably-rare case where writing to a container is a very good chunk of the time spent in a parallel algorithm, concurrent containers never offer a performance benefit to the crude alternatives in my tests and experience. So if it's a particularly performance-critical section of code, and you're seeing hotspots in the methods of the concurrent container in your profiling sessions, I'd give alternatives a try that involves accumulating output in thread-local containers that aren't concurrent (i.e., a thread-local ArrayList
) and then combine all their results at the end of the algorithm (ex: a combined ArrayList
) in a serial fashion from one thread or inside a lock/critical section.
Unfortunately, it's tricky to make things both thread-safe and maximally scalable. You could make things maximally thread-safe in architecture by just executing everything one thread. Thread-safety solved! But that doesn't scale at all to take advantage of parallelism. I find concurrency like that -- a balancing act headbutting Amdahl's Law. I find concurrent containers to be in the medium spot in the sense that using them almost always sacrifices optimal performance to some degree, but the difference between optimal and not-quite-optimal might be quite negligible. Well, you measure and see as I see it.
As for this part of your question:
But due to the multiple Threads modifying and accessing it I think
that the integrity of data may not be secured
That's design-related from my perspective. It's been a good number of years since I thought from the theoretical standpoint computer science, but there is an implicit assumption here in the notion that this data must be shared. Depending on the user-end requirements, the data may or may not need to be shared. Take a video game accessing data from a scene
. It might seem the case that the rendering engine and physics engine and so forth must all share the same scene. That makes intuitive sense. That makes human sense.
Yet that's not necessarily the case. From a user-end standpoint, it might not be important if the renderer has its own copy of a scene that it uses to render results to the screen which might be slightly out of sync with other things going on. So there are often times, at least when optimal performance or frame rates or minimal waiting times are required, where you can duplicate/copy data to allow threads to go as fast as they can without any thread synchronization (including the exclusion of atomic operations) required to access shared data. This can get very elaborate with persistent data structures and the like or not. It depends on the design. But I think one counter-intuitive aspect of multithreaded programming is that we're first tempted to think of more things as needing to be shared among threads than they really need to be shared, and discarding that assumption can produce a whole new degree of parallelism. I've found with many of my colleagues and myself that having concurrent containers at our fingertips tempts us to share more data between threads than we really have to do so. If utilizing the hardware as effectively as we can is the goal, then it often makes sense to abandon the notion of shared data as much as we can, including these concurrent containers. I would actually suggest minimizing their usage if your software is as performance-critical as a game engine.