-1

I have a list of words, 1000 words, I should list them from the most occurring to the least occurring.

Like:

 Dog, 100 times
 Cat, 50 times
 Fish, 40 times
 Monkey, 10 times
 Bird, 10 times
 Camel, 10 times
 .
 .
 .
 Lion, 1 times
 Tiger, 1 times

I did this and works with a while loop, but it takes like 10 seconds, the next part of the task is to use Threads and make the sorting in less time. I plan on using 5 Threads, I can use them and run individually, say Thread1 can sort 1-200, Thread2 can sort 201-400, Thread3 can sort 401-600... but then in the end I would have 5 different lists? There would be 10 Dogs on Thread1 list, 20 Dogs in Thread2 list... Mixed up on console... I would like it to be as on above example using 5 Threads, is it possible? Could you please give some tips, I am new to Threads.

Edit: I'm using the built in sort functionality, for the time being it isn't important which sort algorithm I'm using. The task is not to use the best sorting algorithm but to sort with Threads.

Code:

//This is the list
    ArrayList<String> animalList = new ArrayList<String>();

//This is the map from the list
    Map<String, Integer> map = new HashMap<String, Integer>();
    for (String temp : animalList) {
        Integer count = map.get(temp);
        map.put(temp, (count == null) ? 1 : count + 1);
    }

//This is the final map
    TreeMap<String, Integer> sortedMap = sortMapByValue(map); 


public static TreeMap<String, Integer> sortMapByValue(Map<String, Integer> map){
    Comparator<String> comparator = new ValueComparator(map);
    TreeMap<String, Integer> result = new TreeMap<String, Integer>(comparator);
    result.putAll(map);
    return result;
}


public class ValueComparator implements Comparator<String>{

    HashMap<String, Integer> map = new HashMap<String, Integer>();

    public ValueComparator(Map<String, Integer> map2){
        this.map.putAll(map2);
    }

    @Override
    public int compare(String s1, String s2) {
        if(map.get(s1) >= map.get(s2)){
            return -1;
        }else{
            return 1;
        }   
    }
}
Anarkie
  • 657
  • 3
  • 19
  • 46
  • 1
    What sorting algorithm? Might be the first place to look to optimize rather than trying to speed it up with multithreading. – copeg Jun 30 '16 at 17:22
  • 3
    there is no way this should take anywhere near 100 seconds. You're doing something grossly inefficient somewhere – Cruncher Jun 30 '16 at 17:23
  • @Cruncher Im not trying to sort cats and dogs... 100 is just an example. – Anarkie Jun 30 '16 at 17:25
  • 1
    First of all, you should use a divide-and-conquer algorithm such as [quicksort](https://en.wikipedia.org/wiki/Quicksort) or [merge sort](https://en.wikipedia.org/wiki/Merge_sort). They are easier to implement with multiple threads. – The SE I loved is dead Jun 30 '16 at 17:25
  • @Anarkie Are you seeing "dog" and then looping through the whole list counting how many dogs. And then seeing "cat" and looping through the whole list counting how many cats? Because that's the wrong approach. – Cruncher Jun 30 '16 at 17:28
  • 1
    if it takes you 100 seconds to sort 1000 string, then your solution is terrible. If you are using java, just use the built in sort functionality; it is almost a guarantee that you will never develop something that is better. If you are doing this as a learning experience, then write the sort yourself and compare it (the sort time) to the built in java sort as a measure of "quality". Also, as mentioned by dorukayhan, read about merge sort. – DwB Jun 30 '16 at 17:28
  • It looks like you're passing in map keys to the compare algorithm and then comparing values. Consider instead of using a map making a list of objects (AnimalCount {string name; int count} for example). Then you can use List and probably get better performance. Also sometimes with very large lists its easier to keep it sorted as you add to it (see http://stackoverflow.com/questions/8725387/why-is-there-no-sortedlist-in-java for good ways to keep your list sorted) – Gus Jun 30 '16 at 17:47
  • You said yourself that creating the list (or map) that contains the counts is what takes 10 seconds. That's what you need to speed up, not the sorting. Show us the code you're using to create the counts list. – Jim Mischel Jun 30 '16 at 20:41
  • @JimMischel I added how I create the list. – Anarkie Jul 01 '16 at 10:10

1 Answers1

1

Mostly threads in Java do not execute simultaneously (unless you have a thread per core) and what happens is that the flow is constantly changing between threads and thus if the outcome depends on the order of operations it becomes extremely unpredictable quite fast.

There are some ways to avoid this. One of them is synchronization. That is (putting it simple) that you don't let other threads access some parts of your code until another thread is done with it. This solution can make that your program ends up in a deadlock. This wouldn't really help you a lot, since if you stop your threads when another is say sorting your list, then you gain nothing from using threads.

What you can do is try to use threads in a way in which the outcome doesn't depend on the order of execution. You could, for example, have a thread taking care of the first 200 words, another of the next 200 and so on. Then you should only combine the results in a recursive merge-sort like fashion.


Threads are an excellent way to improve a program's execution time. But... if you need around 100 seconds to sort a list of thousand words your algorithm can be improved.

What you could do is start out by improving your code by using a (e.g- alphabetic) sorting algorithm first and have your list sorted by name (you can do that in O(n·ln(n)) with for example merge-sort, quick-sort or heap-sort). Once you have your list sorted you only need O(n) once to extract your frequencies by going once above the list and another O(m·ln(m)), where m is the length of the frequency list, to order that list in descending frequency order.

All in all you could have your results in O(n·ln(n)+n+m·ln(m)), which in the worst case scenario is O(2·n·ln(n) + n) (if no two words are equal). This is still O(n·ln(n)).

All computers can calculate something of the order O(n·ln(n)) in less than 100 sec :P

Pol
  • 77
  • 6