ArrayLists and Vector comparison

Question

In an Assignment question it was asked whether it is good to use and ArrayList when there are multiple Threads accessing and modifying it.If not what is the best way to do that

I know that the ArrayLists are not Synchronized So that means the performance wise it is a good choice use ArrayLists

But due to the multiple Threads modifying and accessing it I think that the integrity of data may not be secured So in that perspective I think Vectors are more suitable when considering about the data integrity because Vectors are Synchronized.

Please I want to know whether my suggestions are correct.Or whether ArrayLists are the best in above mentioned scenarion

.

What about `CopyOnWriteArrayList`? Why not consider that class, which is a *concurrent* implementation specifically added to support multi-threaded access? — Andreas, Sep 25 '20 at 08:58
See also https://stackoverflow.com/questions/2883140/vector-vs-collections-synchronizedlistarraylist and https://stackoverflow.com/questions/14932034/in-java-vector-and-collections-synchronizedlist-are-all-synchronized-whats-th?noredirect=1&lq=1 — Hulk, Sep 25 '20 at 08:58

score 1 · Answer 1 · answered Sep 25 '20 at 09:03

If you are using Vector it will always have a bit of overhead for synchronising. A ArrayList is not that safe but for your threads you can do the sychronisation by yourself, e.g. with an synchronised object or:

Collections.synchronizedList(new ArrayList<String>());

Then you can still access the ArrayList after your thread tasks without the overhead of Vector. If you are using the list just in threads, I guess Vector is also a good choice. I try to avoid Vector as often as I can.

score 0 · Answer 2 · 2020-09-28T02:31:55.857

I only have experience with the likes of concurrent_vector in Intel's Thread Building Blocks and Microsoft's Parallel Patterns Library, but I would think they are probably comparable with Java's Vector.

From my tests, concurrent_vector in both is quite a bit slower than most alternatives like executing multiple threads and gathering the results in local, thread-unsafe containers (like std::vector in C++), and then appending the results to a shared collection inside a lock, or creating a list of lists with each thread writing to its own list (using some array index) and then, in a serial fashion, combine the results into a single list at the end.

The benefit of the thread-safe version is convenience as far as I can tell. I have never found a concurrent, thread-safe random-access sequence even from Intel's own library where I can't beat it with thread-unsafe alternatives where I accumulate results locally in one thread and then use some basic thread synchronization and combine the results in a serial fashion in one thread. If my tests are write-heavy, then often I can get 2-3 times faster results with this crude method over the concurrent container.

That said, it might be rare in many cases where your times are actually skewed towards writing/appending elements to a container. Far more often I find the bulk of my real-world thread cases are spent reading and crunching numbers and the like and only a small portion of the time is spent pushing the results to the back of a container. So very often the overhead of a concurrent container starts to become negligible, and it's certainly a whole lot more convenient and far less prone to misuse across threads.

In the probably-rare case where writing to a container is a very good chunk of the time spent in a parallel algorithm, concurrent containers never offer a performance benefit to the crude alternatives in my tests and experience. So if it's a particularly performance-critical section of code, and you're seeing hotspots in the methods of the concurrent container in your profiling sessions, I'd give alternatives a try that involves accumulating output in thread-local containers that aren't concurrent (i.e., a thread-local ArrayList) and then combine all their results at the end of the algorithm (ex: a combined ArrayList) in a serial fashion from one thread or inside a lock/critical section.

Unfortunately, it's tricky to make things both thread-safe and maximally scalable. You could make things maximally thread-safe in architecture by just executing everything one thread. Thread-safety solved! But that doesn't scale at all to take advantage of parallelism. I find concurrency like that -- a balancing act headbutting Amdahl's Law. I find concurrent containers to be in the medium spot in the sense that using them almost always sacrifices optimal performance to some degree, but the difference between optimal and not-quite-optimal might be quite negligible. Well, you measure and see as I see it.

As for this part of your question:

But due to the multiple Threads modifying and accessing it I think that the integrity of data may not be secured

That's design-related from my perspective. It's been a good number of years since I thought from the theoretical standpoint computer science, but there is an implicit assumption here in the notion that this data must be shared. Depending on the user-end requirements, the data may or may not need to be shared. Take a video game accessing data from a scene. It might seem the case that the rendering engine and physics engine and so forth must all share the same scene. That makes intuitive sense. That makes human sense.

Yet that's not necessarily the case. From a user-end standpoint, it might not be important if the renderer has its own copy of a scene that it uses to render results to the screen which might be slightly out of sync with other things going on. So there are often times, at least when optimal performance or frame rates or minimal waiting times are required, where you can duplicate/copy data to allow threads to go as fast as they can without any thread synchronization (including the exclusion of atomic operations) required to access shared data. This can get very elaborate with persistent data structures and the like or not. It depends on the design. But I think one counter-intuitive aspect of multithreaded programming is that we're first tempted to think of more things as needing to be shared among threads than they really need to be shared, and discarding that assumption can produce a whole new degree of parallelism. I've found with many of my colleagues and myself that having concurrent containers at our fingertips tempts us to share more data between threads than we really have to do so. If utilizing the hardware as effectively as we can is the goal, then it often makes sense to abandon the notion of shared data as much as we can, including these concurrent containers. I would actually suggest minimizing their usage if your software is as performance-critical as a game engine.

ArrayLists and Vector comparison

2 Answers2