1

Is there an efficient way to get the n upper entries from a sorted Multiset (TreeMultiset)?

To specify what I mean I post my inefficient solution:

public SortedMultiset<DataTuple> headMultiset(int upperBound, BoundType boundType){
    int i=0;
    DataTuple act= this.coreData.firstEntry().getElement();
    Iterator<DataTuple> itr = this.coreData.iterator();
    while(i<=upperBound){
        act = itr.next();
        i+=this.coreData.count(act);
    }
    return headMultiset(act, boundType);
}

In this example DataSet can be seen as Object and this.coreData is the underling TreeMultiset.

I'm really new to that topic, so all kinds of comments would be appreciated.

user1078195
  • 105
  • 2
  • 5
  • As a first approach I used a NavigableMap like proposed in [that topic](http://stackoverflow.com/questions/1314650/using-java-map-for-range-searches). But this is in connection with a lot of overhead. I guess that there is something like a Map inside the TreeMultiset. – user1078195 Dec 02 '11 at 23:31

2 Answers2

1

I'm not 100% sure what result you're looking for. Let's take an example: say the multiset has contents [5 x a, 3 x b, 7 x c, 2 x d, 5 x e]. (As in Multiset.toString(), I am writing "count x object" to represent count occurrences of object.) If I understand the problem correctly, if n is 5, then the result you want is [5 x a], correct?

(It's also not clear whether you want the size of the result multiset to "round." For example: if n was 6 in the above multiset, would you want [5 x a, 1 x b], [5 x a], or [5 x a, 3 x b] ?)

For the moment, I'll assume that you want to round up, that is, you would expect [5 x a, 3 x b]. Then your answer isn't that far off, although I think it's subtly wrong as written. Here's how I would write it:

public <E> SortedMultiset<E> takeElements(SortedMultiset<E> multiset, int n) {
    if (n == 0) { return ImmutableSortedMultiset.of(); }
    Iterator<Multiset.Entry<E>> iterator = multiset.entrySet().iterator();
    E cutoff = null;
    for (int count = 0; count < n && iterator.hasNext(); ) {
        Multiset.Entry<E> entry = iterator.next();
        count += entry.getCount();
        cutoff = entry.getElement();
    }
    if (count < n) { return multiset; }
    // cutoff is not null, since the loop must iterate at least once
    return multiset.headMultiset(cutoff, BoundType.CLOSED);
}
Louis Wasserman
  • 191,574
  • 25
  • 345
  • 413
  • Thanks for your improvement. I am not sure if I want to round up or down. I thought to get the bound type through solves that problem. – user1078195 Dec 03 '11 at 09:54
  • Thanks for your improvement. (I'didn't mange to figure out how to iterate through the entry set:-() I am not sure if I want to round up or down. I thought to get the bound type through solves that problem. So that means that I'd like to get [5 x a] for a open and [5 x a, 3 x b] for the closed bound type. However I think that there is a more powerful way to implement that. While debugging I saw that the private Tree of the TreeMultiset stores the sizes for left and right as well as the distinct elements. I think If it was possible to get this tree one could implement the method much faster. – user1078195 Dec 03 '11 at 10:05
  • This is true, but exposing the tree from Guava would tie TreeMultiset to this implementation forever, so it's not particularly likely. (I'm the author of TreeMultiset, FYI.) – Louis Wasserman Dec 03 '11 at 15:36
  • If you're using ImmutableSortedMultiset, mind you, you could call multiset.elementSet().asList() and do a binary search for the cutoff point, which would give you an O((log n)^2) algorithm -- but if you're using TreeMultiset, the above algorithm is going to be the best you can do. – Louis Wasserman Dec 03 '11 at 16:09
  • Sorry for that stupid question, but where do I find the ImmutableSortedMultiset? – user1078195 Dec 03 '11 at 16:49
  • Even a draft-version of ImmutableSortedMultiset would help me a lot. Because in my case a update of the data in the multiset is not planned- up to now. – user1078195 Dec 05 '11 at 23:01
  • It's already accepted as Guava issue 427 (http://code.google.com/p/guava-libraries/issues/detail?id=427). – Louis Wasserman Dec 06 '11 at 00:37
  • I tested with the source from the git repository. Even though the asList method was faster in most cases, in some cases this method consumed a lot of time. In average the solution with the NavigableMap consumed just 60% of the time. If you are interested I can provide the details. – user1078195 Dec 14 '11 at 17:05
0

Actually the solution with the HashMap seems to have a acceptable performance. I built the hash map via:

public NavigableMap<Integer, E> BuildHashMap (SortedMultiset<E> multiset){
    NavigableMap<Integer, E>  ret = new TreeMap<Integer, E>();
    int n = 0;
    for (Entry<E> e : multiset.entrySet()) {
        ret.put(n, e.getElement());
        n += e.getCount();
    }
    return ret;
}

and access it with .floorEntry(n).getValue().

However elementSet().asList() is the function I'm actually looking for.

user1078195
  • 105
  • 2
  • 5
  • This will only improve your performance if you need to do the lookup for many different values of n. That said, if what you want is the *elements* in this thing, what you want is ret.headMap(n, true).values(). – Louis Wasserman Dec 03 '11 at 21:16
  • Yes. One of my applications is to divide the Multiset into a List of Multisets with about the same size. e.g. diving your example (size 22) data in 2 parts would end in a [[5 x a, 3 x b, 7 x c],[2 x d, 5 x e]] – user1078195 Dec 04 '11 at 16:55