5

I have a question about java collections such as Set or List. More generally objects that you can use in a for-each loop. Is there any requirement that the elements of them actually has to be stored somewhere in a data structure or can they be described only from some sort of requirement and calculated on the fly when you need them? It feels like this should be possible to be done, but I don't see any of the java standard collection classes doing anything like this. Am I breaking any sort of contract here?

The thing I'm thinking about using these for is mainly mathematics. Say for example I want to have a set representing all prime numbers under 1 000 000. It might not be a good idea to save these in memory but to instead have a method check if a particular number is in the collection or not.

I'm also not at all an expert at java streams, but I feel like these should be usable in java 8 streams since the objects have very minimal state (the objects in the collection doesn't even exist until you try to iterate over them or check if a particular object exists in the collection).

Is it possible to have Collections or Iterators with virtually infinitely many elements, for example "all numbers on form 6*k+1", "All primes above 10" or "All Vectors spanned by this basis"? One other thing I'm thinking about is combining two sets like the union of all primes below 1 000 000 and all integers on form 2^n-1 and list the mersenne primes below 1 000 000. I feel like it would be easier to reason about certain mathematical objects if it was done this way and the elements weren't created explicitly until they are actually needed. Maybe I'm wrong.

Here's two mockup classes I wrote to try to illustrate what I want to do. They don't act exactly as I would expect (see output) which make me think I am breaking some kind of contract here with the iterable interface or implementing it wrong. Feel free to point out what I'm doing wrong here if you see it or if this kind of code is even allowed under the collections framework.

import java.util.AbstractSet;
import java.util.Iterator;

public class PrimesBelow extends AbstractSet<Integer>{

    int max;
    int size;

    public PrimesBelow(int max) {
        this.max = max;
    }

    @Override
    public Iterator<Integer> iterator() {
        return new SetIterator<Integer>(this);
    }

    @Override
    public int size() {
        if(this.size == -1){
            System.out.println("Calculating size");
            size = calculateSize();
        }else{
            System.out.println("Accessing calculated size");
        }
        return size;
    }

    private int calculateSize() {
        int c = 0;
        for(Integer p: this)
            c++;
        return c;
    }

    public static void main(String[] args){
        PrimesBelow primesBelow10 = new PrimesBelow(10);
        for(int i: primesBelow10)
            System.out.println(i);
        System.out.println(primesBelow10);
    }
}

.

import java.util.Iterator;
import java.util.NoSuchElementException;

public class SetIterator<T> implements Iterator<Integer> {
    int max;
    int current;
    public SetIterator(PrimesBelow pb) {
        this.max= pb.max;
        current = 1;
    }

    @Override
    public boolean hasNext() {
        if(current < max) return true;
        else return false;
    }

    @Override
    public Integer next() {
        while(hasNext()){
            current++;
            if(isPrime(current)){
                System.out.println("returning "+current);
                return current;
            }
        }
        throw new NoSuchElementException();
    }

    private boolean isPrime(int a) {
        if(a<2) return false;
        for(int i = 2; i < a; i++) if((a%i)==0) return false;
        return true;
    }
}

Main function gives the output
returning 2
2
returning 3
3
returning 5
5
returning 7
7
Exception in thread "main" java.util.NoSuchElementException
    at SetIterator.next(SetIterator.java:27)
    at SetIterator.next(SetIterator.java:1)
    at PrimesBelow.main(PrimesBelow.java:38)

edit: spotted an error in the next() method. Corrected it and changed the output to the new one.

user1661303
  • 539
  • 1
  • 7
  • 20
  • Related: https://stackoverflow.com/questions/33826306/is-it-possible-to-create-a-lazy-better-infinite-collection-defined-by-recursio – shmosel Aug 17 '17 at 19:43
  • I think you don't want to implement the entire `Collection` interface for these; you only want to implement the membership methods like `contains` and `iterator`, right? – wmorrell Aug 17 '17 at 19:45
  • 1
    Do you know about the `Iterable` interface? It's a super of Collection, and only contains the `iterator()` function. Since it doesn't even have a `size()` method, and doesn't define a contract for equals/hashCode, it is quite suitable for lazily-generated infinite sequences. And it can be used in a for-each loop. – yshavit Aug 17 '17 at 19:54
  • I mean, I can definitely see uses for other methods as well, such as addAll. The thing I'm mostly tempted by is having the ability to iterate over an infinite set of data structures in a nice for-each loop before actually creating any of the objects until they are needed. – user1661303 Aug 17 '17 at 19:55
  • @user1661303 the `hasNext()` in your example still does not match the implementation of `next()` - if only returns false if it reached `max`, but would have too return whether there actually is a prime left before reaching `max` – Hulk Aug 21 '17 at 12:17

2 Answers2

1

Well, as you see with your (now fixed) example, you can easily do it with Iterables/Iterators. Instead of having a backing collection, the example would've been nicer with just an Iterable that takes the max number you wish to calculate primes to. You just need to make sure that you handle the hasNext() method properly so you don't have to throw an exception unnecessarily from next().

Java 8 streams can be used easier to perform these kinds of things nowadays, but there's no reason you can't have a "virtual collection" that's just an Iterable. If you start implementing Collection it becomes harder, but even then it wouldn't be completely impossible, depending on the use cases: e.g. you could implement contains() that checks for primes, but you'd have to calculate it and it would be slow for large numbers.

A (somewhat convoluted) example of a semi-infinite set of odd numbers that is immutable and stores no values.

public class OddSet implements Set<Integer> {
    public boolean contains(Integer o) {
        return o % 2 == 1;
    }
    public int size() {
        return Integer.MAX_VALUE;
    }
    public boolean add(Integer i) {
        throw new OperationNotSupportedException();
    }

    public boolean equals(Object o) {
        return o instanceof OddSet;
    }
    // etc. etc.
}
Kayaman
  • 72,141
  • 5
  • 83
  • 121
  • I was thinking about doing it with just an Iterator/extending Iterable instead of AbstractSet but I wasn't sure how it would handle concurrency and multiple threads asking to iterate over the data structure. I figured if the set returns a new Iterator every time it's asked for it each of them would keep track of their own current number and there wouldn't be any conflicts. Not sure if I'm thinking correct though. I also thought the contains() method seemed nice to have, so that's what drove me to extending the AbstractSet class instead of just the Iterator. – user1661303 Aug 17 '17 at 20:23
  • Your example code seems nice. I can't seem to iterate over it in a for each loop though. What exactly should I change in the next() method to make it not propagate the Exception upwards? The spec says it should throw this error if there is no more value, but the method signature doesn't have throw so I'm a bit confused. – user1661303 Aug 17 '17 at 20:27
  • 2
    Yeah, you need to create an `OddSetIterator` which would then perform the actual iteration, and maybe have a size limit too. Concurrency isn't an issue, as `iterator()` can and should return a new iterator every time. Iterators themselves are rarely thread-safe, and you wouldn't want to share one between threads anyway. Your example is flawed in that `hasNext()` can return true, but `next()` can still throw an exception. You need more trickery and calculation. – Kayaman Aug 17 '17 at 20:31
  • ...and that trickery and calculation is the reason why you don't really see this kind of thing done, except with Java 8 streams which makes it easy to generate things on the fly and without having to satisfy the complex contracts of the `Collection` classes. – Kayaman Aug 17 '17 at 20:39
0

As DwB stated, this is not possible to do with Java's Collections API, as every element must be stored in memory. However, there is an alternative: this is precisely why Java's Stream API was implemented!

Streams allow you to iterate across an infinite amount of objects that are not stored in memory unless you explicitly collect them into a Collection.

From the documentation of IntStream#iterate:

Returns an infinite sequential ordered IntStream produced by iterative application of a function f to an initial element seed, producing a Stream consisting of seed, f(seed), f(f(seed)), etc.

The first element (position 0) in the IntStream will be the provided seed. For n > 0, the element at position n, will be the result of applying the function f to the element at position n - 1.

Here are some examples that you proposed in your question:

public class Test {

    public static void main(String[] args) {
        IntStream.iterate(1, k -> 6 * k + 1);
        IntStream.iterate(10, i -> i + 1).filter(Test::isPrime);
        IntStream.iterate(1, n -> 2 * n - 1).filter(i -> i < 1_000_000);
    }

    private boolean isPrime(int a) {
        if (a < 2) {
            return false;
        }

        for(int i = 2; i < a; i++) {
            if ((a % i) == 0) {
                return false;
            }

            return true;
        }
    }
}
Community
  • 1
  • 1
Jacob G.
  • 28,856
  • 5
  • 62
  • 116
  • No matter how many times I look, a rigorous look in the javadoc never says explicitly that the elements in a collection needs to be stored in memory. People just seemed to have inferred this. https://docs.oracle.com/javase/7/docs/api/java/util/Collection.html I've watched a lot of talk about streams, but I am a bit confused about the need for them as a new concept. They just seem like specialized Iterators to me, that keep track of a little bit more information and evaluates lazily. – user1661303 Aug 19 '17 at 12:12
  • 2
    "as every element must be stored in memory" - according to which part of the contract? – Hulk Aug 21 '17 at 12:09
  • Exactly. I don't like the idea of implicit contracts. It opens up the door to all kinds of complications. – user1661303 Aug 27 '17 at 12:17