4

I need to repeatedly (hundred of thousands of times) retrieve an element (different each time) from a Collection which contains dozens of thousand of Objects.

What is the quickest way to do this retrieval operation? At the moment my Collection is a List and I iterate on it until I have found the element, but is there a quicker way? Using a Map maybe? I was thinking to do:

  • Putting the Objects in a Map, with the key being the id field of the Object, and the Object itself being the value.
  • Then doing get(id) on the Map should be much faster than looping through a List.
  • If that is a correct way to do it, should I use a HashMap or TreeMap? - my objects have no particular ordering.

Any advice on the matter would be appreciated!

Last note: if an external library provides a tool to answer this I'd take it gladly!

seinecle
  • 10,118
  • 14
  • 61
  • 120
  • 1
    How do you identify which element to retrieve? Everything else depends on that. – chrylis -cautiouslyoptimistic- Jan 15 '16 at 09:10
  • 1
    TreeMap keeps the data sorted as per the comparator defined. Pros and cons of using `HashMap` against `TreeMap` or vice a versa, depends on your implementation. – SacJn Jan 15 '16 at 09:10
  • The case is the following: networks are made of relations between nodes (the source and the target). I have a Collection of relations. A relation is an object with a field String "source" (the id of the node source) and a field String "target" (the id of the node target). Relations have other fields as well. For each relation, I need to retrieve the source node and the target node. These nodes are kept in a separate Collection. – seinecle Jan 15 '16 at 12:30

4 Answers4

5

As per the documentation of the Tree Map (emphasis my own):

The map is sorted according to the natural ordering of its keys, or by a Comparator provided at map creation time, depending on which constructor is used.

In your case, you state that the items have no particular order and it does not seem that you are after any particular order, but rather just be able to retrieve data as fast as possible.

HashMaps provide constant read time but do not guarantee order, so I think that you should go with HashMaps:

This class makes no guarantees as to the order of the map; in particular, it does not guarantee that the order will remain constant over time. This implementation provides constant-time performance for the basic operations (get and put), assuming the hash function disperses the elements properly among the buckets.

As a side note, the memory footprint of this can get quite high quite fast, so it might also be a good idea to look into a database approach and maybe use a cache like mechanism to handle more frequently used information.

npinti
  • 51,780
  • 5
  • 72
  • 96
  • Thank you. Yes the issue of the memory footprint is worrying. I'd like to keep everything in memory. I wish there was a simple way (no profiling...) to get the memory taken by my Map at anytime. – seinecle Jan 17 '16 at 10:01
3

I've created code which tests the proformance of BinarySearch, TreeMap and HashMap for the given problem.

In case you are rebuilding the collection each time, HashMap is the fastest (even with standard Object's hashCode() implementation!), sort+array binary search goes second and a TreeMap is last (due to complex rebuilding procedure).

proc array: 2395
proc tree : 4413
proc hash : 1325

If you are not rebuilding the collection, HashMap is still the fastest, an array's binary search is second and a TreeMap is the slowest, but with only slightly lower speed than array.

proc array: 506
proc tree : 561
proc hash : 122

Test code:

public class SearchSpeedTest {

    private List<DataObject> data;
    private List<Long> ids;
    private Map<Long, DataObject> hashMap;
    private Map<Long, DataObject> treeMap;
    private int numRep;
    private int dataAmount;
    private boolean rebuildEachTime;

    static class DataObject implements Comparable<DataObject>{
        Long id;

        public DataObject(Long id) {
            super();
            this.id = id;
        }

        public DataObject() {
            // TODO Auto-generated constructor stub
        }

        @Override
        public final int compareTo(DataObject o) {
            return Long.compare(id, o.id);
        }

        public Long getId() {
            return id;
        }

        public void setId(Long id) {
            this.id = id;
        }

        public void dummyCode() {

        }

    }

    @FunctionalInterface
    public interface Procedure {
        void execute();
    }

        public void testSpeeds() {

            rebuildEachTime = true;
            numRep = 100;
            dataAmount = 60_000;

            data = new ArrayList<>(dataAmount);
            ids = new ArrayList<>(dataAmount);
            Random gen = new Random();

            for (int i=0; i< dataAmount; i++) {
                long id = i*7+gen.nextInt(7);
                ids.add(id);
                data.add(new DataObject(id));
            }
            Collections.sort(data);


            treeMap = new TreeMap<Long, DataObject>();
            populateMap(treeMap);


            hashMap = new HashMap<Long, SearchSpeedTest.DataObject>();
            populateMap(hashMap);


            Procedure[] procedures = new Procedure[] {this::testArray, this::testTreeMap, this::testHashMap}; 
            String[] names = new String[]                {"array",       "tree ",            "hash "};


            for (int n=0; n<procedures.length; n++) {
                Procedure proc = procedures[n];
                long startTime = System.nanoTime();
                for (int i=0; i<numRep; i++) {
                    if (rebuildEachTime) {
                        Collections.shuffle(data);
                    }
                    proc.execute();
                }
                long endTime = System.nanoTime();
                long diff = endTime - startTime;

                System.out.println("proc "+names[n]+":\t"+(diff/1_000_000));
            }

        }

        void testHashMap() {
            if (rebuildEachTime) {
                hashMap = new HashMap<Long, SearchSpeedTest.DataObject>();
                populateMap(hashMap);
            }

            testMap(hashMap);
        }

        void testTreeMap() {

            if (rebuildEachTime) {
                treeMap = new TreeMap<Long, SearchSpeedTest.DataObject>();
                populateMap(treeMap);
            }

            testMap(treeMap);
        }

        void testMap(Map<Long, DataObject> map) {

            for (Long id: ids) {
                DataObject ret = map.get(id);
                ret.dummyCode();
            }

        }

        void populateMap(Map<Long, DataObject> map) {
            for (DataObject dataObj : data) {
                map.put(dataObj.getId(), dataObj);
            }
        }

        void testArray() {
            if (rebuildEachTime) {
                Collections.sort(data);
            }
            DataObject key = new DataObject();
            for (Long id: ids) {
                key.setId(id);
                DataObject ret = data.get(Collections.binarySearch(data, key));
                ret.dummyCode();
            }
        }

        public static void main(String[] args) {
            new SearchSpeedTest().testSpeeds();
        }
    }
Dariusz
  • 21,561
  • 9
  • 74
  • 114
  • 1
    Since OP doesn't seem to care about ordering, a `HashMap` is far more preferable than a `TreeMap` (better performance, and smaller footprint). – Andreas Jan 15 '16 at 09:32
  • Smaller footprint - that's true. But searches for `Long id` may be faster for a `TreeMap`, depending on the data. My gut says that they will in most cases. – Dariusz Jan 15 '16 at 10:18
  • Your testing code is far from perfect. Since you do not store the generated ids in your `ids` list you end up always accessing the element with `id==0`. If you add `ids.add(id);` in your preparation code the results look very different and clearly favor the solution with a `HashMap`. – Thomas Kläger Jan 15 '16 at 11:43
  • It was even worse. Your code did just this: generate some DataObjects with random ids and shuffle / sort them. Since the `ids` list was empty, each of the `for (Long id: ids) {..}` loops did nothing. – Thomas Kläger Jan 15 '16 at 12:21
  • Yes, I noticed. And it's fixed now. – Dariusz Jan 15 '16 at 12:35
2

HashMap will be more efficient in general, so use it whenever you don't care about the order of the keys.

when you want to keep your entries in Map sorted by key than use TreeMap but sorting will be overhead in your case as you dont want any particulate order.

Keval
  • 1,857
  • 16
  • 26
0

You can use a map if you have a good way to define the key of the map. In the worst case you can use your object as key and value.

As ordering is not important use a HashMap. To maintain the order in a TreeMap there is additional cost when adding an element, as it must be added at the correct position.

hotzst
  • 7,238
  • 9
  • 41
  • 64