0

So here is the program counting the appearance of the most occurring word in a text document. However it's super slow because of the get(i) method I suppose. Any ideas how to make it faster? I know that if I'll use array it will be faster, but I want it to stay as linked list, and if possible just change the get(i) part.

import java.io.File;
    import java.util.Scanner;
    import java.util.Map.Entry;
    import java.util.AbstractMap;
    import java.util.LinkedList;

public class wordcount {


    public static Entry<String, Integer> count_LINKED_LIST(String[] tokens) {
        LinkedList<Entry<String, Integer>> list = new LinkedList<Entry<String, Integer>>();
        for (int j = 0; j < tokens.length; j++) {
            String word = tokens[j];
            boolean found = false;
            for (int i = 0; i < list.size(); i++) {
                Entry<String, Integer> e = list.get(i);

                if (word.equals(e.getKey())) {
                    e.setValue(e.getValue() + 1);
                    list.set(i, e);
                    found = true;
                    break;
                }
            }
            if (!found)
                list.add(new AbstractMap.SimpleEntry<String, Integer>(word, 1));
        }

        int maxCount = 0;
        String maxWord = "";
        for (int i = 0; i < list.size(); i++) {
            int count = list.get(i).getValue();
            if (count > maxCount) {
                maxWord = list.get(i).getKey();
                maxCount = count;
            }
        }
        return new AbstractMap.SimpleEntry<String, Integer>(maxWord, maxCount);
    }

    static String[] readText(String PATH) throws Exception {
        Scanner doc = new Scanner(new File(PATH)).useDelimiter("[^a-zA-Z]+");
        int length = 0;
        while (doc.hasNext()) {
            doc.next();
            length++;
        }

        String[] tokens = new String[length];
        Scanner s = new Scanner(new File(PATH)).useDelimiter("[^a-zA-Z]+");
        length = 0;
        while (s.hasNext()) {
            tokens[length] = s.next().toLowerCase();
            length++;
        }
        doc.close();

        return tokens;
    }

    public static void main(String[] args) throws Exception {

        String PATH = "/Users/username/foldername/textdocument.txt";
        String[] tokens = readText(PATH);
        long startTime = System.currentTimeMillis();
        Entry<String, Integer> entry = count_LINKED_LIST(tokens);
        long endTime = System.currentTimeMillis();
        String time = String.format("%12d", endTime - startTime);
        System.out.println("time\t" + time + "\t" + entry.getKey() + ":" + entry.getValue());
    }

}
  • Have you tried using a for each loop? – NielsNet Sep 23 '18 at 18:23
  • 1
    `get(int i)` on a `LinkedList` is very slow, since it has to scan from the beginning to locate index `i`. To make it faster, use an `ArrayList`. However, the actually get a fast implementation of *"counting the appearance of the most occurring word"*, use a `Map`. – Andreas Sep 23 '18 at 18:27
  • 2
    LinkedList is the worst collection you could choose for this task (for basically any task in fact), **and** you use it in an extremely inefficient way by using indices to access elements, first to get elements, then to set them (although that is completely unnecessary). Use the appropriate data structure: a HashMap. – JB Nizet Sep 23 '18 at 18:28
  • You can't make it faster because a `LinkedList` is just not appropriate for what you are trying to do. Why not just use a `HashMap`? – Falla Coulibaly Sep 23 '18 at 18:28
  • 1
    @FallaCoulibaly it could be way faster by using an iterator loop or a foreach loop. But a HashMap is indeed much more appropriate. – JB Nizet Sep 23 '18 at 18:29
  • Why do you only want to use `LinkedList`? There are some appropriate solutions. – Sam Sep 23 '18 at 18:33
  • @JBNizet Are you saying that linked lists are useless? – jannis Sep 23 '18 at 18:36
  • @jannis its use is strongly discouraged within Google's code (ArrayList or ArrayDeque are recommended in preference). – Andy Turner Sep 23 '18 at 18:39
  • @jannis yes. I think I've yet to find a use-case where it would be a better choice than an ArrayList. – JB Nizet Sep 23 '18 at 18:43
  • 1
    @JBNizet I disagree that "linked lists are the worst collection for basically any task in fact". Linked lists couldn't be more fundamental to programming and are used in implementing hashes and task queues in your OS, for example. It's true that it's not ideal for *this* task, and perhaps you're talking about Java's LL collections implementation specifically. Use of a LinkedList instead of an ArrayList might be appropriate whenever you need to imply access to front and back elements, as in Queues and Stacks. If you're using the `list.get` method, it's probably the wrong structure. – ggorlen Sep 23 '18 at 18:56
  • ArrayList (for back only access) and ArrayDeque (for front and back access) would be better choices. The locality of the data and the very fast array operations make it faster than linked lists. Food for thoughts: https://twitter.com/DrDeprecator/status/668447303481495552 (Stuart Marks is part of the JDK team at Oracle) – JB Nizet Sep 23 '18 at 19:06
  • @JBNizet For example ArrayDeque doesn't implement the List interface so it's not a straightforward replacement. ArrayList on the other hand is backed by an array and may cause performance problems (both memory and CPU) when resized intensively during application lifetime. What I'm saying is that it's there for a reason and saying it's useless is a bit of an overstatement. – jannis Sep 23 '18 at 19:11
  • It's there because it's a classical data structure that everybody learns in school. But studies, made by much more serious people than I am, show that there is always a better choice. It's there for a reason, but nowadays, the reason is history. – JB Nizet Sep 23 '18 at 19:17
  • You can just iterate over it using an iterator or the foreach syntax. You don't need the set(i, e) as it is already that value. – rghome Sep 24 '18 at 09:26

1 Answers1

0

you could use a map for that (token is key, token occurences is value) and use Java8+ Stream API:

public static Map<String, Long> count(String[] tokens) { 
    return Arrays.stream(tokens).collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
}