1

I am trying to resolve sorting a map , which contains huge data(1000K). Is there any efficient way than this to sorting these maps ? below is the code snippet.

    Map<Integer, String> myMap1 = new HashMap<Integer, String>();
    Map<String,Integer>  myMap2 = new HashMap< String,Integer>();

    List <Entry<Integer,String>> lst1 = new ArrayList<Entry<Integer,String>>(myMap1.entrySet());
    Collections.sort(lst1, new Comparator<Entry<Integer,String>>(){
        @Override
        public int compare(Entry e1, Entry e2)
        {
            return ((String) e1.getValue()).compareTo((String) e2.getValue());
        }}
    );


    List <Entry<String,Integer>> lst2 = new ArrayList<Entry<String,Integer>>(myMap2.entrySet());        
    Collections.sort(lst2, new Comparator<Entry<String,Integer>>(){
        @Override
        public int compare(Entry e1, Entry e2)
        {
            return ((Integer) e1.getValue()).compareTo((Integer) e2.getValue());
        }}
    );
M A
  • 71,713
  • 13
  • 134
  • 174
  • Are you getting the data from database? – dsharew May 11 '15 at 16:15
  • 4
    Sounds like an XY problem. Why do you need to sort so much data in memory? – Reimeus May 11 '15 at 16:15
  • 1
    And what *exactly* do you mean by "1000K"? 1 million *entries*? – Jon Skeet May 11 '15 at 16:18
  • a) Are these lookup dictionaries where the `myMap2` is the "reverse" of `myMap1`? In other words, do the entries in both maps have the same integer and string pairs? b) Would a sorted list of the strings be enough for your task? This might speed things a little bit. – DarkDust May 11 '15 at 16:19
  • @Reimeus that is the requirement :) – Narayana Sai May 11 '15 at 16:26
  • @jon Skeet , yep 1 M entries – Narayana Sai May 11 '15 at 16:26
  • @codehx getting data from database but sorting should be done in memory not at database level. – Narayana Sai May 11 '15 at 16:28
  • 1
    @NarayanaSai I don't know who told you this, but if the field is indexed then it will be better to do the sorting in database rather than in the app, unless the sorting algorithm is more complex than a simple data comparison, which is not in this case. – Luiggi Mendoza May 11 '15 at 16:30
  • @LuiggiMendoza , just for sorting do i need to use database connection ? – Narayana Sai May 11 '15 at 16:44
  • 1
    If you obtain the data from database, obtain it sorted already. Only sort on app when: 1) the field to sort is not indexed and the database may take lot of time doing the sort by itself, or 2) the sorting algorithm is more complex than a single field comparison. – Luiggi Mendoza May 11 '15 at 16:45

1 Answers1

1

IMO a priority queue can also be a good approach:

Map<Integer, String> myMap1 = new HashMap<Integer, String>();
PriorityQueue<Entry<Integer, String>> pq = new PriorityQueue<Map.Entry<Integer,String>>(myMap1.size(), new Comparator<Entry<Integer, String>>() {
    @Override
    public int compare(Entry<Integer, String> arg0, Entry<Integer, String> arg1) {
        return arg0.getValue().compareTo(arg1.getValue());
    }
});
pq.addAll(myMap1.entrySet());
while (!pq.isEmpty()) {
    System.out.println(pq.poll());
}

Also Google Guava can be a good option as it provides a BiMap implementations which can be inversed, and then just sort on inversed map keys.

 Map<Integer, String> myMap1 = new HashMap<Integer, String>();
    // insert values in myMap
    Map<String,Integer>  myMap2 = myMap1.inverse();
    SortedMap<Integer, Character> sortedInversed = new TreeMap<Integer, Character>(myMap2 );
akhil_mittal
  • 23,309
  • 7
  • 96
  • 95
  • 3
    You will be better by using a `TreeMap` rather than `HashMap`. `TreeMap` is already sorted. – Luiggi Mendoza May 11 '15 at 16:20
  • In TreeMap insertion will be O(lg n) where as it would be O(1) in Map. Also IMO `Set` and `Map` do not handle duplicates well. – akhil_mittal May 11 '15 at 16:21
  • 1
    *IMO Set and Map do not handle duplicates well* looks like you haven't used them in the right way then, but that's aside of the problem. Also, inserting in the `PriorityQueue` will take time O(lg n), so you have to pay that price in order to sort, regardless of the data structure you choose. – Luiggi Mendoza May 11 '15 at 16:23
  • A map doesn't allow duplicates. And `TreeMap` uses a red black tree, which is a balanced tree so you won't have the `LinkedList` situation as you describe. I really don't know what you're talking about. – Luiggi Mendoza May 11 '15 at 16:25
  • Sorry I meant they don't handle duplicates well in case of keys. – akhil_mittal May 11 '15 at 16:26
  • Yes TreeMap uses RedBlackTree but it will have its own associated cost of balancing for insertion/deletion etc. – akhil_mittal May 11 '15 at 16:27
  • And that costs is O(1), so there's no big associated cost on it. In this way, both `TreeMap` and `PriorityQueue` will behave and perform the same, so there's no great improvement by using a different structure to *sort* the data. In fact, I'm starting to think that using a `Map` here is a very bad idea to begin with. – Luiggi Mendoza May 11 '15 at 16:29
  • In that case it seems we can't have values with the same key in `TreeMap` whereas in a priority queue I think we can. That is the only plus point, IMO. – akhil_mittal May 11 '15 at 16:34