Finally i found it!!!
The reason not just in performance and bits-mask operations (yes, they are faster, but not significantly). The real reason is to allow loop back the elements
capacity if we use sequential adding-removing operations. In other words: reuse released cells after remove()
operation.
Consider the following examples (initial capacity is 16):
Only add()
:
add 15 elements => head=0, tail=15
add more 5 elements => doubleCapacity()
=> head=0, tail=20, capacity=32

add()
-remove()
-add()
:
add 15 elements => head=0, tail=15
remove 10 elements => tail loops back to removed indexes => head=10, tail=15
add more 5 elements => the capacity remains 16, the elements[]
array is not rebuilt or reallocated! => new elements are added into the place of the removed elements to the beginning of the array => head=10, tail=4 (looped back to the start of the array from 15->0->1->2->3->4). Note the values 16-19 are inserted to the indexes 0-3

So, in this case using power of two and concise bit operations makes much more sense. With such approach the operations like if ( (tail = (tail + 1) & (elements.length - 1)) == head)
allow to assign and verify easily that the looped tail
does not overlap with the head
(yeah, the stupid snake where actually the tail bites the head :) )
The code snippet to play around:
ArrayDeque<String> q = new ArrayDeque<>(15); // capacity is 16
// add 15 elements
q.add("0"); q.add("1"); q.add("2"); q.add("3"); q.add("4");
q.add("5"); q.add("6"); q.add("7"); q.add("8"); q.add("9");
q.add("10"); q.add("11");q.add("12");q.add("13");q.add("14");
// remove 10 elements from the head => tail LOOPS BACK in the elements[]
q.poll();q.poll();q.poll();q.poll();q.poll();q.poll();q.poll();q.poll();q.poll();q.poll();
// add 5 elements => the elements[] is not reallocated!
q.add("15");q.add("16");q.add("17");q.add("18");q.add("19");
q.poll();