0

How can I count repeated words in a text file, using an array?

My program is able to print out total words in the file, But how can I get my program to print the number of different words and also have printed out a list of the number of the repeated words like this:

Cake: 4 a: 320 Piece: 2 of 24

   (Words with capital letters and small letters are considered the same word)

void FileReader() { 


    System.out.println("Oppgave A");
    int totalWords = 0; 
    int uniqueWords = 0; 
    String [] word = new String[35000];
    String [] wordC = new String [3500];
    try {
        File fr = new File("Alice.txt");
        Scanner sc = new Scanner (fr);

        while(sc.hasNext()){
        String words = sc.next();
        String[] space = words.split(" ");
        String[] comma = words.split(",");
            totalWords++;


            }
        System.out.println("Antall ord som er lest er: " + totalWords);         
    } catch (Exception e) {

        System.out.println("File not found");

    }
ThinkPink93
  • 3
  • 1
  • 2
  • 8

6 Answers6

2

That would be very ineficient with array, because after each word you would have to iterate through the array to see if the word occured already. Instead use HashMap where key is the word and value is the number of occurencies. It's easier and faster to see if HashMap contains a key than to see if array contains an element.

EDIT:

HashMap<String, Integer>
Lucas
  • 3,181
  • 4
  • 26
  • 45
1

Try using a set, and checking the return value using iteration.

Set<String> set = new HashSet(Arrays.asList(word));
int unique = 0;
for (String temp : word) {
    if (set.add(temp)) {
        unique++;
    }
}

//or...
Set<String> set = new HashSet(Arrays.asList(word));
int unique = set.size();

This is of course after having all values imported already.

Edit: Seeing you can't use Maps (and assuming other data structures), you might have to do the somewhat gross way of checking every value.

//get a new word from the text file
boolean isUnique = true;
//for every word in your array; input == your new word
    if (word.equalsIgnoreCase(input)) {
        unique = false
    }
//end loop
if (isUnique) {
    unique++; // Assuming unique is the count of unique words
}
Rogue
  • 11,105
  • 5
  • 45
  • 71
  • But how do i add the words from the txt file in an array and then check if two or more words a same? – ThinkPink93 Nov 12 '13 at 11:26
  • Clarify? Do you need the amount of times a word appears, the number of unique words, or ...? You do this verification by comparing the word you got from the text file to every single current word in the array. – Rogue Nov 12 '13 at 11:40
  • - Count up how many times each word occurs. NOTE: You may assume that the maximum 5000 is unique (different, odd) word of the file being read - Then comes a row for each unique word in the text that was read out, with the word and the number of times it occurs. Order of words printed is arbitrary. cake: 4 a: 320 piece: 2 of: 24 Here's the assignment ^^ – ThinkPink93 Nov 12 '13 at 11:44
  • If you aren't able to use a map, I would honestly create a wrapper class for the word then. AKA You have another class that you would store the word and number of times it occurs, and then when you iterate through all those classes compare the brought in word to other words. – Rogue Nov 12 '13 at 11:50
1

You can use a Map each time you add a word which is already in the map you increment the value (count)

Evgeniy Dorofeev
  • 133,369
  • 30
  • 199
  • 275
0

Every time you are adding a word you need to check if the word already exists in your array. To compare you will need to use:

 word1.equalsIgnoreCase(word2);
rui.mendes
  • 106
  • 5
  • You would need to do that for every word in the array. – Rogue Nov 12 '13 at 10:55
  • @Rogue The OP did specify to ignore case. – Radiodef Nov 12 '13 at 11:00
  • @Radiodef yes, I'm saying you would have to iterate throughout the entire array manually every time you added a word, nothing about case. Though seeing the recent comment about the Maps I'm assuming this is homework, and have adjusted my own answer as well. – Rogue Nov 12 '13 at 11:01
0

Try this:

 try {
            List<String> list = new ArrayList<String>();
            int totalWords = 0;
            int uniqueWords = 0;
            File fr = new File("Alice.txt");
            Scanner sc = new Scanner(fr);
            while (sc.hasNext()) {
                String words = sc.next();
                String[] space = words.split(" ");
                for (int i = 0; i < space.length; i++) {
                    list.add(space[i]);
                }
                totalWords++;
            }
            System.out.println("Words with their frequency..");
            Set<String> uniqueSet = new HashSet<String>(list);
            for (String word : uniqueSet) {
                System.out.println(word + ": " + Collections.frequency(list,word));
            }
        } catch (Exception e) {

            System.out.println("File not found");

        }
Jhanvi
  • 5,069
  • 8
  • 32
  • 41
  • Problem is we're not allowed to use Hash, we have to solve the problem using simple Arrays...Otherwise thank you so much – ThinkPink93 Nov 12 '13 at 11:12
0

You can improve on simple array searching using Arrays.sort and Arrays.binarySearch.

Essentially, for each word, check if it is already in your array with binarySearch. If it is, increment your count. If it is not, add it to the array and sort again. The current Java sort algorithm is very fast when the array is already mostly sorted. It uses TimSort.

There are other structures such as TreeSet you could use to avoid using hashing but I suspect that would also be disallowed.

OldCurmudgeon
  • 64,482
  • 16
  • 119
  • 213