3

IntStream may be a the easiest way but I can only pick up smallest M numbers as below:

public class Test {
    private static final int[] arr = {5, 3, 4, 2, 9, 1, 7, 8, 6};

    public static void main(String[] args) throws Exception {
        System.out.println(Arrays.asList(IntStream.of(arr).sorted().limit(5).boxed().toArray()));
    }
}

btw, considering algorithm complexity and assuming N >> M, a "sorted + limit" approach just have a complexity of O(N log(N)).

I think the best complexity may reach to O(N log(M)) but I do not know whether Java 8 has this kind of stream methods or collectors.

Tagir Valeev
  • 97,161
  • 19
  • 222
  • 334
auntyellow
  • 2,423
  • 2
  • 20
  • 47
  • 4
    `IntStream.of(arr).sorted().skip(N-M).boxed()` should do it. – user207421 Jun 11 '15 at 04:03
  • Related (from yesterday): http://stackoverflow.com/questions/30740471/java-how-to-sort-an-intstream-in-reverse-order#comment49537625_30740471 – yshavit Jun 11 '15 at 04:18
  • Find another related: http://stackoverflow.com/questions/19227698/write-a-program-to-find-100-largest-numbers-out-of-an-array-of-1-billion-numbers But seem no existing method in Java. – auntyellow Jun 11 '15 at 04:34
  • The quickselect algorithm, or related selection algorithms, do partial sorting in O(N) time. http://en.wikipedia.org/wiki/Selection_algorithm#Incremental_sorting_by_selection – Edward Doolittle Jun 11 '15 at 04:52
  • I think more important than the algorithmic time complexity is the memory complexity. with *N >> M* you would only need M instead if N intermediate storage. – the8472 Jun 11 '15 at 05:25

5 Answers5

5

If you must use Streams:

IntStream.of(arr).sorted().skip(N-M)

Otherwise use a PriorityQueue and write yourself an inverting Comparator. Insertion will be O(N(log(N)) and removal of M elements will be O(M(log(N)). Not what you asked for, but maybe close enough.

user207421
  • 305,947
  • 44
  • 307
  • 483
3

EJP has it right, I tested it - yields 8 and 9 when given an input of 2.

import java.util.stream.IntStream;
public class Test {
    private static final int[] arr = {5, 3, 4, 2, 9, 1, 7, 8, 6};

    public static void main(String[] args) throws Exception { 
        int n = Integer.parseInt(args[0]);
        System.out.println("Finding "+n+" largest numbers in arr");
        IntStream.of(arr).sorted().skip(arr.length-n).boxed().forEach(big -> System.out.println(big));
    }
}
ozborn
  • 980
  • 5
  • 24
  • Thanks. But how to reduce algorithm complexity? – auntyellow Jun 11 '15 at 04:30
  • 1
    I'm not sure how you can reduce the complexity and still be using java-8, I think one of the big selling features of lambda expressions is that the underlying implementation isn't specified leaving room for compiler optimizations. At least that is my understanding. – ozborn Jun 11 '15 at 04:35
2

If you are already using google guava in your project, you can take advantage of MinMaxPriorityQueue:

Collection<..> min5 = stream.collect(
    toCollection(MinMaxPriorityQueue.maximumSize(5)::create)
);
Misha
  • 27,433
  • 6
  • 62
  • 78
2

It's possible to create a custom collector using the JDK PriorityQueue to solve your task:

public static <T> Collector<T, ?, List<T>> maxN(Comparator<? super T> comparator, 
                                                int limit) {
    BiConsumer<PriorityQueue<T>, T> accumulator = (queue, t) -> {
        queue.add(t);
        if (queue.size() > limit)
            queue.poll();
    };
    return Collector.of(() -> new PriorityQueue<>(limit + 1, comparator),
            accumulator, (q1, q2) -> {
                for (T t : q2) {
                    accumulator.accept(q1, t);
                }
                return q1;
            }, queue -> new ArrayList<>(queue));
}

Usage:

int[] arr = {5, 3, 4, 2, 9, 1, 7, 8, 6};
System.out.println(IntStream.of(arr).boxed().collect(maxN(Comparator.naturalOrder(), 2)));
// [8, 9]
System.out.println(IntStream.of(arr).boxed().collect(maxN(Comparator.reverseOrder(), 3)));
// [3, 1, 2]

It might be faster for big data sets and small limits as it does not sort. If you want a sorted result, you can add the sorting step to the finisher.

Tagir Valeev
  • 97,161
  • 19
  • 222
  • 334
  • Great code. But combiner seems useless (i.e. `(q1, q2) -> null` is ok) because PriorityQueue is not thread safe. – auntyellow Jun 12 '15 at 08:55
  • 2
    @auntyellow: it's not necessary for `Collector` to be a thread-safe when using parallel streams (unless the `Collector` has `CONCURRENT` characteristic). Existing collectors work fine in parallel with `HashMap`, `ArrayList`, etc. – Tagir Valeev Jun 12 '15 at 09:39
2

You can achieve your complexity goal by creating a histogram of the values:

public static IntStream maxValues(IntStream source, int limit) {
    TreeMap<Integer,Integer> m=new TreeMap<>();
    source.forEachOrdered(new IntConsumer() {
        int size, min=Integer.MIN_VALUE;
        public void accept(int value) {
            if(value<min) return;
            m.merge(value, 1, Integer::sum);
            if(size<limit) size++;
            else m.compute(min=m.firstKey(), (k,count)->count==1? null: count-1);
        }
    });
    if(m.size()==limit)// no duplicates
        return m.keySet().stream().mapToInt(Integer::valueOf);
    return m.entrySet().stream().flatMapToInt(e->{
        int value = e.getKey(), count = e.getValue();
        return count==1? IntStream.of(value): IntStream.range(0, count).map(i->value);
    });
}

It creates a map from int values to their corresponding number of occurrences but limits its contents to the desired number of values, hence, it’s operation has a O(log(M)) complexity (worst case, if no duplicates) and, since the operation is performed once for each value, it’s overall complexity is O(N×log(M)) as you wished.

You may test it with your original array as

int[] arr = {5, 3, 4, 2, 9, 1, 7, 8, 6};
maxValues(Arrays.stream(arr), 3).forEach(System.out::println);

but to test some corner cases, you may use an array containing duplicates like

int[] arr = {8, 5, 3, 4, 2, 2, 9, 1, 7, 9, 8, 6};
// note that the stream of three max elements contains one of the two eights

If you strive for maximum performance, replacing the boxing treemap with an adequate data structure using primitive data types may be feasible but that would be a minor performance optimization as this solution already solved the complexity problem.

By the way, this solution works for arbitrary streams, i.e. doesn’t need to know the value of N.

Holger
  • 285,553
  • 42
  • 434
  • 765
  • compare to PriorityQueue, TreeMap seems more complicated – auntyellow Jun 15 '15 at 02:22
  • 2
    A `PriorityQueue` might be a working kludge as long as you have no duplicates but when you have duplicates, it requires *storage space* for each of them instead of simply counting the number of occurrences as the map based approach does. Anyway, the accepted answer’s solution is even simpler and it’s not clear whether the difference in complexity (what you originally asked for) really make a difference in performance. – Holger Jun 15 '15 at 08:01