2

I want to have a list of objects that satisfies just a few basic requirements. It needs to support fast random access and it needs to be safe to use from multiple threads. Reads will dominate by far and ideally should be about as fast as normal ArrayList access, i.e. no locking. There is no need to insert elements in the middle, delete, or change the value at an index: the only mutation required is to be able to append a new element to the end of the list. More specifically a caller will specify an index at which an element should be placed, and the index is expected to be only a few more than the length of the list, i.e. the list is dense. There is also no need for iteration.

Is there anything that supports this in Java? It can be in a third party library.

If not I am thinking I will implement my own class. There'll be an internal array of arrays, each twice as big as the last. Lookups by index will do just a little more maths to figure out which array has the right element and what the index in that array is. Appends will be similar unless they go beyond the available space, in which case a new array is allocated. Only the creation of a new array will require a lock to be acquired.

Does this sound like a sensible approach?

This doesn't sound like a particularly novel data structure. Does it have a name?

Alex Hall
  • 34,833
  • 5
  • 57
  • 89

3 Answers3

2

Reads will dominate by far and ideally should be about as fast as normal ArrayList access, i.e. no locking.

CopyOnWriteArrayList generally works in this scenario because the cost of insertion will be amortized over the large number of cheap read accesses.

Under the condition it is append-only one could amortize it even further by pre-sizing the array and maintaining a separate length and bumping that atomically after an insert.

Other approaches are only necessary if you're concerned about peak latency for inserts. But that's not one of the criteria you mentioned.


You also have to keep in mind that you're asking for a data structure tailored to your use-case (append-only, lock-free, O(1) access, etc. etc.) whereas the JDK provides general-purpose data structures which make some tradeoffs to cover more use-cases.

There are 3rd-party libraries which provide more specialized implementations for limited use-cases.


The type of datastructure you describe is a spined buffer and used internally by the JDK in some places (e.g. in the form of java.util.stream.SpinedBuffer<E>), but that implementation is not thread-safe and not exposed since it does not implement collection APIs.

Its javadocs state:

One or more arrays are used to store elements. The use of a multiple arrays has better performance characteristics than a single array used by ArrayList, as when the capacity of the list needs to be increased no copying of elements is required. This is usually beneficial in the case where the results will be traversed a small number of times.

I.e. it's mostly useful for write-once, read-a-few-times scenarios where the allocation costs will dominate.

In read-heavy data-structures the cost of indirection, extra math operations and non-sequential memory access might actually outstrip the cost of occasional copying/reallocation.

the8472
  • 40,999
  • 5
  • 70
  • 122
1

Any list wrapped using Collections.synchronizedList(...) satisfies the requirements as you have stated them.

However:

  1. Insertion anywhere other than at the end of the list will be a concurrency bottleneck. The longer the list, the worse it will get.

  2. There are caveats in the javadocs about iteration that you should read.

CopyOnWriteArrayList is an alternative, but all updates on a copy-on-write list are O(N) irrespective of where you insert the element. This is expensive, and would be a concurrency bottleneck is there are multiple writers. The argument that the cost of updates can be ignored, only applies if the ratio of writes to reads reduces over time. If the ratio is constant over time, then you need to take the (O(N)) cost updates into account.

Note that a synchronized wrapper for an ArrayList will give O(1) lookup and (amortized) O(1) insertion at the end of the list. Admittedly, insertion into the middle of a list is O(N) ... but there is no list structure that I'm aware of that gives better than O(logN) for insertion at a random position. (Look up "indexable skiplist".)


UPDATE

You commented:

"I don't need random insertion, only appends, except that the position of the append can be beyond the end of the list. For example I might have a list [0,1,2] and want to insert a 4 at index 4 so my list will then be [0,1,2,null,4]."

If that is a correct characterization of your problem, then the data structure you are talking about is NOT a "list". Certainly, it is not compatible with the Java List API. In the List context, appending means adding an element immediately after the current last element of the list; i.e. at position == list.size().

Maybe you should be looking for a concurrent sparse array class. Here is one possibility:

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • `synchronizedList` simply wraps each operation in a `synchronized (mutex)` which is definitely not what I want. O(logN) lookup is not as fast as I expect is possible. – Alex Hall Oct 03 '15 at 01:38
  • Why is that definitely not what you want? – Stephen C Oct 03 '15 at 01:41
  • Reading should never need a lock because theoretically (e.g. in my suggested implementation) writers (i.e. appenders) don't get in the way of reads at all. At the very least readers should not be locking each other out, yet here they do. – Alex Hall Oct 03 '15 at 01:50
  • Re: your latest edit: I don't need random insertion, only appends, except that the position of the append can be beyond the end of the list. For example I might have a list [0,1,2] and want to insert a 4 at index 4 so my list will then be [0,1,2,null,4]. – Alex Hall Oct 03 '15 at 01:52
  • Well the only way you can get writers never blocking readers is to use copy-on-write, and that gives you O(N) updates. The real problem here is that the >>algorithms<< to do better are >>not known<<. – Stephen C Oct 03 '15 at 01:54
  • Re your last comment. That's NOT a list. That's an array with a flexible bound. – Stephen C Oct 03 '15 at 01:55
  • OK, I wasn't quite sure what to call it. In any case it does not need to implement the Java API for a list. If it did most of the methods would just throw exceptions for being unsupported. – Alex Hall Oct 03 '15 at 02:03
  • That would be bad design. In fact, almost none of the mutation operations in the `List` API would have the semantics that you need. – Stephen C Oct 03 '15 at 02:12
1

Java has a concurrent list implementation in java.util.concurrent. CopyOnWriteArrayList which is a thread-safe variant of ArrayList in which all mutative operations (add, set, and so on) are implemented by making a fresh copy of the underlying array.

From doc:

This is ordinarily too costly, but may be more efficient than alternatives when traversal operations vastly outnumber mutations, and is useful when you cannot or don't want to synchronize traversals, yet need to preclude interference among concurrent threads. The "snapshot" style iterator method uses a reference to the state of the array at the point that the iterator was created. This array never changes during the lifetime of the iterator, so interference is impossible and the iterator is guaranteed not to throw ConcurrentModificationException. The iterator will not reflect additions, removals, or changes to the list since the iterator was created. Element-changing operations on iterators themselves (remove, set, and add) are not supported. These methods throw UnsupportedOperationException.

All elements are permitted, including null.

As per your requirement:

Reads will dominate by far and ideally should be about as fast as normal ArrayList access, i.e. no locking. There is no need to insert elements in the middle, delete, or change the value at an index: the only mutation required is to be able to append a new element to the end of the list.

Appending an element at end will result in a fresh copy of underlying array (O(n)) and may be too expensive. I believe using Collection.synchronizedList may be a good option but that involves locking (blocking).

Also check this.

Community
  • 1
  • 1
akhil_mittal
  • 23,309
  • 7
  • 96
  • 95