47

I'm looking for a good sorted list for java. Googling around give me some hints about using TreeSet/TreeMap. But these components is lack of one thing: random access to an element in the set. For example, I want to access nth element in the sorted set, but with TreeSet, I must iterate over other n-1 elements before I can get there. It would be a waste since I would have upto several thousands elements in my Set.

Basically, I'm looking for some thing similar to a sorted list in .NET, with ability to add element fast, remove element fast, and have random access to any element in the list.

Has this kind of sorted list implemented somewhere? Thanks.

Edited

My interest in SortedList grows out of this problems: I need to maintains a list of many thousands object (and can grow up to many hundred of thousands). These objects will be persisted to database. I want to randomly select few dozens of element from the whole list. So, I tried to maintain a separated on-memory list that contains the primary keys (Long numbers) of all objects. I need to add/remove keys from the list when object is added / removed from database. I'm using an ArrayList right now but I'm afraid ArrayList would not suit it when the number of records grows. (Imagine you have to iterate over several hundred thousands of elements every time an object is removed from database). Back to the time when I did .NET programming, then I would use a sorted List (List is a .NET class that once Sorted property set to true, will maintain order of its element, and provide binary search that help remove/insert element very quick). I'm hoping that I can find some thing similar from java BCL but unluckily, I didn't find a good match.

Phương Nguyễn
  • 8,747
  • 15
  • 62
  • 96
  • 5
    TreeSet gives you log(n) lookup, not linear. – Stefan Kendall Apr 18 '10 at 04:13
  • Do you have any performance metrics which dictate that log(n) is too slow? You're almost certainly pre-optimizing. "Several thousands of elements" sorts in nanoseconds, and maintaining a treeset takes little to no time. log_10(thousands) = 5. – Stefan Kendall Apr 18 '10 at 04:17
  • 5
    @Stefan: TreeSet gives you a log(n) `contains` test (where n is the size), but no way to access the i-th element easily (where i is an arbitrary index). – Christian Semrau Apr 18 '10 at 11:02
  • @Stefan: If I want to get 20 random item from a list of 4000 item, then I would normally do 20*4000/2=40000 iteration. And if I need to get 20 random items times over times then the number of iterations would probably very high. That's why I hesitated in choosing TreeSet/TreeMap. – Phương Nguyễn Apr 18 '10 at 14:28
  • 1
    No, it wouldn't! That's the point, if you cared to read what I said. It would take 20 * log_2(4000) = 20*~12 = 240 iterations. This is NEGLIGIBLE on modern hardware! – Stefan Kendall Apr 18 '10 at 14:33
  • Now note: the actual log base will vary depending on how treeset is actually implemented, as well as insertion order, but the order of magnitude for operations should be correct. – Stefan Kendall Apr 18 '10 at 14:34
  • 1
    @Stefan: Thanks for the warning of pre-optimization. Actually, I'm using an ArrayList without sort right now. (Because, I don't actually need sort. I just need sort for faster adding, removing element). But if there is an optimized solution out there, why not adapt it? I believed that the solution should be around (because .NET has it for quite some time, so I guess Java should have it, too). – Phương Nguyễn Apr 18 '10 at 14:36
  • 1
    No, there's no reason to use a faster solution, ever, unless performance requirements dictate it. If a solution is intuitive, use it. Furthermore, if you work in interfaces instead of implementations, you can always switch the implementation later for performance benefits. This is no longer 1985; sorting 40,000 elements is no longer a performance burden. – Stefan Kendall Apr 18 '10 at 14:38
  • Also, "Because .NET has it" is possibly the worst reason I've heard to add a feature to an API. – Stefan Kendall Apr 18 '10 at 14:49
  • 2
    I'm not sure .NET even has this. `List` does not have a Sort or Sorted property. It does have a Sort method. This does not behave as you are asking for here (this is no different than Java `Collections.sort` method). Even in .NET, keeping a sorted list doesn't improve the remove time. – Kevin Brock Apr 18 '10 at 16:42
  • 1
    Your accepted answer is not about a sorted list. You really should update your question to reflect this. It confused me when I read the TreeList documentation and discovered TreeList was *NOT SORTED*. – Petriborg Oct 02 '12 at 14:56
  • I guess, the "sorted" property comes from Delphi (or alike). It really has nothing to do in an ordinary list as it requires a much more complicated data structure to be of any use - otherwise the performance is terrible. – maaartinus Dec 02 '19 at 15:22
  • There exists an enhancement request for the JDK requesting a `SortedList` ([JDK-4151747](https://bugs.openjdk.java.net/browse/JDK-4151747)), but that is open for a long time now without any activity. – Marcono1234 Mar 20 '22 at 17:07

10 Answers10

48

It seems that you want a list structure with very fast removal and random access by index (not by key) times. An ArrayList gives you the latter and a HashMap or TreeMap give you the former.

There is one structure in Apache Commons Collections that may be what you are looking for, the TreeList. The JavaDoc specifies that it is optimized for quick insertion and removal at any index in the list. If you also need generics though, this will not help you.

Foo Bar
  • 328
  • 3
  • 20
Kevin Brock
  • 8,874
  • 1
  • 33
  • 37
  • 1
    +1 for an implementation that should actually outperform a collection in the Java API. – Stefan Kendall Apr 18 '10 at 17:12
  • I think LinkedList wont suits here since it has to iterate to reach an element with particular index. But add/remove is faster. – Kanagavelu Sugumar Oct 15 '13 at 17:45
  • @Sugumar `TreeList` is not a `LinkedList` nor does it have behavior like a linked list (see the link which provides performance comparison between the two) so I don't understand your comment. You are correct though, `LinkedList` is not appropriate for what the question was asking for. – Kevin Brock Oct 15 '13 at 20:40
  • `TreeList` has disadvantage similar to `LinkedList` the elements are scattered and uses more pointers since it seems to use AVL nodes internal as implementation see: http://grepcode.com/file/repo1.maven.org/maven2/org.apache.openjpa/openjpa-all/2.0.0/org/apache/commons/collections/list/TreeList.java . You can see that some operations there are not very optimal at all. So https://kjellkod.wordpress.com/2012/02/25/why-you-should-never-ever-ever-use-linked-list-in-your-code-again/ applies here. That's why you probably shall use only `ArrayList` out of all. – Mladen Adamovic May 01 '15 at 10:13
26

This is the SortedList implementation I am using. Maybe this helps with your problem:

import java.util.Collection;
import java.util.Collections;
import java.util.Comparator;
import java.util.LinkedList;
/**
 * This class is a List implementation which sorts the elements using the
 * comparator specified when constructing a new instance.
 * 
 * @param <T>
 */
public class SortedList<T> extends ArrayList<T> {
    /**
     * Needed for serialization.
     */
    private static final long serialVersionUID = 1L;
    /**
     * Comparator used to sort the list.
     */
    private Comparator<? super T> comparator = null;
    /**
     * Construct a new instance with the list elements sorted in their
     * {@link java.lang.Comparable} natural ordering.
     */
    public SortedList() {
    }
    /**
     * Construct a new instance using the given comparator.
     * 
     * @param comparator
     */
    public SortedList(Comparator<? super T> comparator) {
        this.comparator = comparator;
    }
    /**
     * Construct a new instance containing the elements of the specified
     * collection with the list elements sorted in their
     * {@link java.lang.Comparable} natural ordering.
     * 
     * @param collection
     */
    public SortedList(Collection<? extends T> collection) {
        addAll(collection);
    }
    /**
     * Construct a new instance containing the elements of the specified
     * collection with the list elements sorted using the given comparator.
     * 
     * @param collection
     * @param comparator
     */
    public SortedList(Collection<? extends T> collection, Comparator<? super T> comparator) {
        this(comparator);
        addAll(collection);
    }
    /**
     * Add a new entry to the list. The insertion point is calculated using the
     * comparator.
     * 
     * @param paramT
     * @return <code>true</code> if this collection changed as a result of the call.
     */
    @Override
    public boolean add(T paramT) {
        int initialSize = this.size();
        // Retrieves the position of an existing, equal element or the 
        // insertion position for new elements (negative).
        int insertionPoint = Collections.binarySearch(this, paramT, comparator);
        super.add((insertionPoint > -1) ? insertionPoint : (-insertionPoint) - 1, paramT);
        return (this.size() != initialSize);
    }
    /**
     * Adds all elements in the specified collection to the list. Each element
     * will be inserted at the correct position to keep the list sorted.
     * 
     * @param paramCollection
     * @return <code>true</code> if this collection changed as a result of the call.
     */
    @Override
    public boolean addAll(Collection<? extends T> paramCollection) {
        boolean result = false;
        if (paramCollection.size() > 4) {
            result = super.addAll(paramCollection);
            Collections.sort(this, comparator);
        }
        else {
            for (T paramT:paramCollection) {
                result |= add(paramT);
            }
        }
        return result;
    }
    /**
     * Check, if this list contains the given Element. This is faster than the
     * {@link #contains(Object)} method, since it is based on binary search.
     * 
     * @param paramT
     * @return <code>true</code>, if the element is contained in this list;
     * <code>false</code>, otherwise.
     */
    public boolean containsElement(T paramT) {
        return (Collections.binarySearch(this, paramT, comparator) > -1);
    }
    /**
     * @return The comparator used for sorting this list.
     */
    public Comparator<? super T> getComparator() {
        return comparator;
    }
    /**
     * Assign a new comparator and sort the list using this new comparator.
     * 
     * @param comparator
     */
    public void setComparator(Comparator<? super T> comparator) {
        this.comparator = comparator;
        Collections.sort(this, comparator);
    }
}

This solution is very flexible and uses existing Java functions:

  • Completely based on generics
  • Uses java.util.Collections for finding and inserting list elements
  • Option to use a custom Comparator for list sorting

Some notes:

  • This sorted list is not synchronized since it inherits from java.util.ArrayList. Use Collections.synchronizedList if you need this (refer to the Java documentation for java.util.ArrayList for details).
  • The initial solution was based on java.util.LinkedList. For better performance, specifically for finding the insertion point (Logan's comment) and quicker get operations (https://dzone.com/articles/arraylist-vs-linkedlist-vs), this has been changed to java.util.ArrayList.
Konrad Holl
  • 616
  • 5
  • 12
  • What makes this a good implementation, and superior to the other answers offered? Please add this explanation to your answer, rather than just pasting in code. – Erick Robertson Sep 28 '12 at 14:29
  • 5
    Why are you extending LinkedList? Binary search will have O(n) as this is not random access collection. – Anatoliy Apr 24 '13 at 15:39
  • 1
    I guess addAll method can be improved a bit - now it has O(N^2) complexity, but if you could insert all the elements in bulk unsorted and then sort it again -> via Collectoins.sort() you could have a better complexity, like -> O(N logN + 2N). BTW - I guess LinkedList is a better alternative over ArrayList as it grows will less cost. – Anatoliy Apr 25 '13 at 08:27
  • 1
    @Anatoliy Took a while to get back to you - I like your comment. I modified the addAll method to use super.addAll like you suggested. I added an if statement for being able to choose which ever method is faster - addAll or multiple add calls. The value 4 is just a guess, though - no tests performed. – Konrad Holl Jul 05 '13 at 12:24
  • 2
    `Collectinos.binarySearch()` is O(n) on linked lists, so your add is no more efficient than looping to find the insertion point. The same criticism applies to the `contains` method. This can be a lot faster simply by replacing `LinkedList` with `ArrayList`. – Logan Pickup Oct 19 '16 at 03:55
  • 1
    Event with `ArrayList`, I'm afraid, it's not really fast. Concerning `addAll`, I would just call `super` and then `Collection#sort` as it uses TimSort which recognizes partly sorted collections. `+++` What worse: You're extending `ArrayList` and violating its contact (e.g., `add` is defined to add to the end of list). And the `ArrayList` violates your contract as there's `add(int index, E element)` which breaks the sorting. – maaartinus Dec 02 '19 at 15:36
  • 1
    @maaartinus even worse, extending a class not designed for extension, is easy to subvert. Like calling `Collections.swap(list, x, y)` and voila, the list is not in sorted order anymore. Or using `list.listIterator().add(element);` to add an element without calling an overridden method… – Holger Mar 21 '22 at 08:37
16

Phuong:

Sorting 40,000 random numbers:

0.022 seconds

import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.Random;


public class test
{
    public static void main(String[] args)
    {
        List<Integer> nums = new ArrayList<Integer>();
        Random rand = new Random();
        for( int i = 0; i < 40000; i++ )
        {
            nums.add( rand.nextInt(Integer.MAX_VALUE) );
        }

        long start = System.nanoTime();
        Collections.sort(nums);
        long end = System.nanoTime();

        System.out.println((end-start)/1e9);
    }
}   

Since you rarely need sorting, as per your problem statement, this is probably more efficient than it needs to be.

Stefan Kendall
  • 66,414
  • 68
  • 253
  • 406
  • 2
    Hi Stefan: Thanks for the bench mark. But, actually, what I really want at a sorted list is fast remove. I don't even mind sorting my list because I'm randomly selecting elements from it anyway. I'm interested in sorted list because sorted list would give great performance on removing/inserting elements. Now I'm working with several thousands, but I'd expect my data to grow to several hundred thousands. Without a true sorted list, then I don't think I can get along well with that. – Phương Nguyễn Apr 18 '10 at 15:13
  • 1
    @PhươngNguyễn I think LinkedList will give great performance on removing/inserting than a SortedList. – Kanagavelu Sugumar Oct 15 '13 at 17:44
3

SortedList decorator from Java Happy Libraries can be used to decorate TreeList from Apache Collections. That would produce a new list which performance is compareable to TreeSet. https://sourceforge.net/p/happy-guys/wiki/Sorted%20List/

thinker
  • 221
  • 3
  • 8
3

Depending on how you're using the list, it may be worth it to use a TreeSet and then use the toArray() method at the end. I had a case where I needed a sorted list, and I found that the TreeSet + toArray() was much faster than adding to an array and merge sorting at the end.

Brendan Long
  • 53,280
  • 21
  • 146
  • 188
  • 1
    @Long: Thanks. Your solution is pretty good. Except that with a high change set, then toArray() will be called many times. For example, if my set grows from 4000-4100 then I need to call toArray 100 times, each time is an iteration over 4000+ items, result in 400000 extra iterations. I'm looking for a solution that could eliminate that extra iterations as well. But well, like Stefan Kendall tried to communicate, it would be pre-optimization. – Phương Nguyễn Apr 18 '10 at 14:33
  • You're definitely right, you don't want to do this every time you add something. What I meant is that if you know you'll always be adding in batches, the TreeSet + toArray() might work. – Brendan Long Apr 18 '10 at 20:37
1

GlazedLists has a very, very good sorted list implementation

Marcono1234
  • 5,856
  • 1
  • 25
  • 43
Kevin Day
  • 16,067
  • 8
  • 44
  • 68
  • SortedList is log(n) lookup, much like TreeSet. – Stefan Kendall Apr 18 '10 at 04:14
  • 2
    Unlike TreeSet, SortedList allows random access to any given index, and seems therefore better suited. I don't know about any sorted list structure that allows O(log(n)) insertion and O(1) index access. – Christian Semrau Apr 18 '10 at 10:57
  • Hmm, it looks like a Desktop GUI component to me. Is there any *condensed* library on that stuff? – Phương Nguyễn Apr 18 '10 at 14:41
  • GlazedLists is most certainly not a GUI component. Give it a try. As for a condensed library (presumably something that has only the sorting functionality?) no. There's a *lot* of work that goes into this sort of thing, and it wouldn't make sense to do it for just one type of list. The whole GL approach is insanely elegant. – Kevin Day Apr 20 '10 at 03:48
  • Hah - and now I look at the javadocs of TreeList (from Commons), and it looks like they have done the work and kept it in a single class. GL is still a fantastic option for live and declaritive lists - I highly recommend it. – Kevin Day Apr 20 '10 at 03:56
1

What about using a HashMap? Insertion, deletion, and retrieval are all O(1) operations. If you wanted to sort everything, you could grab a List of the values in the Map and run them through an O(n log n) sorting algorithm.

edit

A quick search has found LinkedHashMap, which maintains insertion order of your keys. It's not an exact solution, but it's pretty close.

Dave DeLong
  • 242,470
  • 58
  • 448
  • 498
1

To test the efficiancy of earlier awnser by Konrad Holl, I did a quick comparison with what I thought would be the slow way of doing it:

package util.collections;

import java.util.ArrayList;
import java.util.Collection;
import java.util.Collections;
import java.util.Iterator;
import java.util.List;
import java.util.ListIterator;

/**
 *
 * @author Earl Bosch
 * @param <E> Comparable Element
 *
 */
public class SortedList<E extends Comparable> implements List<E> {

    /**
     * The list of elements
     */
    private final List<E> list = new ArrayList();

    public E first() {
        return list.get(0);
    }

    public E last() {
        return list.get(list.size() - 1);
    }

    public E mid() {
        return list.get(list.size() >>> 1);
    }

    @Override
    public void clear() {
        list.clear();
    }

    @Override
    public boolean add(E e) {
        list.add(e);
        Collections.sort(list);
        return true;
    }

    @Override
    public int size() {
        return list.size();
    }

    @Override
    public boolean isEmpty() {
        return list.isEmpty();
    }

    @Override
    public boolean contains(Object obj) {
        return list.contains((E) obj);
    }

    @Override
    public Iterator<E> iterator() {
        return list.iterator();
    }

    @Override
    public Object[] toArray() {
        return list.toArray();
    }

    @Override
    public <T> T[] toArray(T[] arg0) {
        return list.toArray(arg0);
    }

    @Override
    public boolean remove(Object obj) {
        return list.remove((E) obj);
    }

    @Override
    public boolean containsAll(Collection<?> c) {
        return list.containsAll(c);
    }

    @Override
    public boolean addAll(Collection<? extends E> c) {

        list.addAll(c);
        Collections.sort(list);
        return true;
    }

    @Override
    public boolean addAll(int index, Collection<? extends E> c) {
        throw new UnsupportedOperationException("Not supported.");
    }

    @Override
    public boolean removeAll(Collection<?> c) {
        return list.removeAll(c);
    }

    @Override
    public boolean retainAll(Collection<?> c) {
        return list.retainAll(c);
    }

    @Override
    public E get(int index) {
        return list.get(index);
    }

    @Override
    public E set(int index, E element) {
        throw new UnsupportedOperationException("Not supported.");
    }

    @Override
    public void add(int index, E element) {
        throw new UnsupportedOperationException("Not supported.");
    }

    @Override
    public E remove(int index) {
        return list.remove(index);
    }

    @Override
    public int indexOf(Object obj) {
        return list.indexOf((E) obj);
    }

    @Override
    public int lastIndexOf(Object obj) {
        return list.lastIndexOf((E) obj);
    }

    @Override
    public ListIterator<E> listIterator() {
        return list.listIterator();
    }

    @Override
    public ListIterator<E> listIterator(int index) {
        return list.listIterator(index);
    }

    @Override
    public List<E> subList(int fromIndex, int toIndex) {
        throw new UnsupportedOperationException("Not supported.");
    }

}

Turns out its about twice as fast! I think its because of SortedLinkList slow get - which make's it not a good choice for a list.

Compared times for same random list:

  • SortedLinkList : 15731.460
  • SortedList : 6895.494
  • ca.odell.glazedlists.SortedList : 712.460
  • org.apache.commons.collections4.TreeList : 3226.546

Seems glazedlists.SortedList is really fast...

EarlB
  • 111
  • 2
  • 7
  • It's faster than the answer by Konrad Holl because the other answer uses `LinkedList` as its base list, and performs several very slow operations on it (binarySearch, in particular, is slow on a linked list unless comparisons are extremely expensive compared to link traversals). – Logan Pickup Oct 19 '16 at 03:52
1

Generally you can't have constant time look up and log time deletions/insertions, but if you're happy with log time look ups then you can use a SortedList.

Not sure if you'll trust my coding but I recently wrote a SortedList implementation in Java, which you can download from http://www.scottlogic.co.uk/2010/12/sorted_lists_in_java/. This implementation allows you to look up the i-th element of the list in log time.

Mark Rhodes
  • 10,049
  • 4
  • 48
  • 51
0

You need no sorted list. You need no sorting at all.

I need to add/remove keys from the list when object is added / removed from database.

But not immediately, the removal can wait. Use an ArrayList containing the ID's all alive objects plus at most some bounded percentage of deleted objects. Use a separate HashSet to keep track of deleted objects.

private List<ID> mostlyAliveIds = new ArrayList<>();
private Set<ID> deletedIds = new HashSet<>();

I want to randomly select few dozens of element from the whole list.

ID selectOne(Random random) {
    checkState(deletedIds.size() < mostlyAliveIds.size());
    while (true) {
        int index = random.nextInt(mostlyAliveIds.size());
        ID id = mostlyAliveIds.get(index);
        if (!deletedIds.contains(ID)) return ID;
    }
}

Set<ID> selectSome(Random random, int count) {
    checkArgument(deletedIds.size() <= mostlyAliveIds.size() - count);
    Set<ID> result = new HashSet<>();
    while (result.size() < count) result.add(selectOne(random));
}

For maintaining the data, do something like

void insert(ID id) {
    if (!deletedIds.remove(id)) mostlyAliveIds.add(ID);
} 

void delete(ID id) {
    if (!deletedIds.add(id)) {
         throw new ImpossibleException("Deleting a deleted element);
    }
    if (deletedIds.size() > 0.1 * mostlyAliveIds.size()) {
        mostlyAliveIds.removeAll(deletedIds);
        deletedIds.clear();
    }
}

The only tricky part is the insert which has to check if an already deleted ID was resurrected.

The delete ensures that no more than 10% of elements in mostlyAliveIds are deleted IDs. When this happens, they get all removed in one sweep (I didn't check the JDK sources, but I hope, they do it right) and the show goes on.

With no more than 10% of dead IDs, the overhead of selectOne is no more than 10% on the average.

I'm pretty sure that it's faster than any sorting as the amortized complexity is O(n).

maaartinus
  • 44,714
  • 32
  • 161
  • 320