3

Is there a collection data structure in Java (and C# if you know) that can be iterated over, with the following properties:

  • The current element can be removed without affecting the current iterator (the rest of the already-started iterator's iterations).
  • New elements can be added, but will also not affect the current iterator—not be included as an iterated value while the current iterator's iteration is still going. In my case only one new element would be added per iteration, but none should be seen until a new iterator is fetched from the iterable.
  • The order of elements doesn’t matter.

In effect, there is an incoming list and an outgoing list of items. The incoming list is iterated and some are copied to a new list. Some new elements can be added to the new list during iteration. After the iteration is over, the old incoming list is replaced with the new outgoing list. This whole process is itself within a loop.

So, copying the elements to a newly constructed collection object each time seems inefficient compared to one that had these add/remove properties.

I was kind of thinking of some kind of queue that lets me preview the current item and then either dequeue it or not, and move onto the next item. And I can add more items to the head of the queue but won’t see them because I’m moving toward the end. A doubly linked list could have these properties, right?

If you really want to know what it’s for, it’s to soup up the second large code block in an answer of mine.

ErikE
  • 48,881
  • 23
  • 151
  • 196
  • For C#: The usual Collections cannot be modified from within an enumerator context, but you might check out [TPL Dataflow Library](https://msdn.microsoft.com/en-us/library/mt604386(v=vs.111).aspx?f=255&MSPPError=-2147217396) – Pretasoc Sep 10 '18 at 07:59
  • In java, see [`List#listIterator`](https://docs.oracle.com/javase/8/docs/api/java/util/List.html#listIterator--) and [`ListIterator`](https://docs.oracle.com/javase/8/docs/api/java/util/ListIterator.html#add-E-) – Misha Sep 10 '18 at 08:06
  • Why you ask for C# and java? If you need it for java but there was a class in C#, how did it help you? – Tim Schmelter Sep 10 '18 at 08:10
  • @TimSchmelter It helps me because I program in both and I want to know for my own learning. That’s not so strange, is it? – ErikE Sep 10 '18 at 08:13
  • I can only second @Misha, in Java, just use a mutable list and acquire a `ListIterator`. But in-place modifications are not always more efficient than populating a new list. – Holger Sep 10 '18 at 09:20
  • Given your requirements, I think creating a new list with the items you want in it (as you are already doing) is the right way to go. – Matthew Watson Sep 10 '18 at 11:30
  • Very weird that someone would down-vote this. Can anyone guess what the objection is? Oh... just saw the close vote. I can't understand that, either. How is this not about programming according to the rules? It's a specific question, solving a specific problem, with the real possibility of discrete, answerable, objectively-assessable answers. – ErikE Sep 10 '18 at 18:04

3 Answers3

1

In C# this is easy enough to do with List<T> and for (...) rather than foreach (...):

using System;
using System.Collections.Generic;
using System.Linq;

namespace Demo
{
    static class Program
    {
        static void Main()
        {
            List<int> list = Enumerable.Range(1, 10).ToList();

            for (int i = 0; i < list.Count; ++i)
            {
                if ((list[i] % 3) == 0) // Remove multiples of 3.
                    list.RemoveAt(i--); // NOTE: Post-decrement i
                else if ((list[i] % 4) == 0) // At each multiple of 4, add (2*value+1)
                    list.Add(list[i] * 2 + 1);
                else
                    ; // Do nothing.
            }

            Console.WriteLine(string.Join(", ", list)); // Outputs 1, 2, 4, 5, 7, 8, 10, 17
        }
    }
}

The key here is to use an index rather than foreach, and to not change anything before the current index (which from reading your requirements, is not needed).

However if you do need to add or remove elements before the current index, then this approach doesn't work (or at least, it becomes a lot more complicated).

Matthew Watson
  • 104,400
  • 10
  • 158
  • 276
  • What is the performance of removing an item from the middle of a list? – ErikE Sep 10 '18 at 08:19
  • @ErikE: https://stackoverflow.com/questions/6052003/how-expensive-is-list-removeat0-for-a-generic-list (MSDN: _"This method is an O(n) operation, where n is (Count - index)"_) – Tim Schmelter Sep 10 '18 at 08:25
  • I would expect a 9 in the final list because added elements shouldn’t be iterated over. – ErikE Sep 10 '18 at 08:26
  • @MatthewWatson That’s what I was afraid of... shuffling all the remaining items. – ErikE Sep 10 '18 at 08:28
  • @ErikE You should add that to your requirements then. They say `New elements can be added, but will not affect the current iteration.` whereas what you meant was `New elements can be added, but will not affect the current iteration **or any future iteration**.` – Matthew Watson Sep 10 '18 at 08:28
  • Given your actual requirements, I think that just creating a new collection will be the best and easiest approach. – Matthew Watson Sep 10 '18 at 08:29
  • @MatthewWatson I tried to make that requirement clear, I’m sorry it was ambiguous. “Iteration” can refer to the entire process, or it can be one of many. Like with an IEnumerable, you ask it for an IEnumerator, and then you are enumerating, but another thread could begin enumerating the same IEnumerator and get a different list. The second sentence after the ambiguous one I also added earlier to try to make it clearer. – ErikE Sep 10 '18 at 08:35
1

For C# you can use LinkedList<T> like in the good ol' C:

public DoStuff<T>(LinkedList<T> list)
{
    var node = list.First;

    while(node != null)
    {
        // do stuff with node

        node = node.Next;
    }
}

The node is of type LinkedListNode<T>. You can access the value with node.Value, remove with list.Remove(node). For T elem you also have list.AddAfter(node, elem), list.AddBefore(node, elem), list.AddFirst(elem) and list.AddLast(elem). All of these operations are O(1). You can do all sorts of iterations with this, if you want to iterate only over original elements, then cache the next node before doing anything and remember the last node:

var lastNode = list.Last;
var node = list.First;

while(node != lastNode.Next)
{
    var nextNode = node.Next;

    // do stuff with node

    node = nextNode;
}

The equivalent data structure in Java is also called LinkedList<E>. However, ListIterator<E> on a standard List<E> might be cleaner to use.

V0ldek
  • 9,623
  • 1
  • 26
  • 57
  • How does one avoid seeing new elements during the iteration process if one uses the ListIterator? – ErikE Sep 10 '18 at 08:38
  • You could just skip the added element. `listIterator.add(e);` and then immediately `listIterator.next();`. – V0ldek Sep 10 '18 at 09:21
1

In java there is the CopyOnWriteArrayList which does what you want: It makes a copy of the backing array every time you change anything. But that does mean any iteration is 'set in stone' once you begin iteration, and therfore you can remove/add to the underlying collection at will without affecting any running iterators whatsoever.

You can also build your own collection type that has this behaviour. It'd be a 3 liner:

public class ConstantIterationArrayList<T> extends ArrayList<T> {
    public Iterator<T> iterator() {
        return new ArrayList<T>(this).iterator();
    }
}

(The above makes a copy of the list and then gives you an iterator for the copy, thus conveniently ensuring any modification to this list has absolutely no effect on that iterator).

Here's the real problem with your question:

The above will make copies of the underlying data store from time to time (my snippet above does so every time you make an iterator. CopyOnWriteArrayList does so every time you call remove() or add()). The operation 'copy the underlying data store' takes O(n) time, as in, it takes twice as long for a list that is twice as large.

ArrayList in general has the property that the remove() operation, unless you are removing an element at or very close to the end of the list, is an O(n) operation: Removing an element from a list takes twice as long if the list is twice as large.

Fortunately, modern CPUs have sizable caches and can work exceedingly fast within a cache page. Which translates to: Despite the fact that copying data over feels inefficient, in practice, as long as the backing array fits within a page or so, it's much faster than data stores based on LinkedList semantics. We're talking about up to ~1000 elements give or take. (Note, in general, almost everything you do to a LinkedList is O(n) and where ArrayList tends to work well with modern CPU architecture, LinkedList tends to do very poorly. The point being: LinkedList is very rarely the right answer either!)

So, if you have no more than ~1000 items in this list, I'd go ahead with CopyOnWriteArrayList or the custom class I wrote for you above.

However, if you have more than that, ArrayList is not the right data store to use here. Even if you forget about your constant iteration needs for now; calling remove() on large array lists is a bad idea (unless removing very close to the end of the list). In this case I'd sketch out precisely which operations you need to do on this data type and exactly which ones really need to be fast, and once you have a complete list, try to find a collection type which fits your needs exactly, and in the (likely) case nothing specific exists that is a perfect match, make one yourself. Like above, when you have to roll your own data type its usually a good idea to let the bulk of the work be done by existing data types, so either extend an existing one, or encapsulate one.

rzwitserloot
  • 85,357
  • 5
  • 51
  • 72
  • It sounds then like the implementation I initially wrote in the linked answer (at the bottom of my question) is probably close to the best I'm going to get, eh? – ErikE Sep 10 '18 at 18:09