1

I want to read an file, and want to collect top n words depends on word frequency.

I have tried the following code to count every words in a string.

public static void main(String[] args) throws FileNotFoundException, IOException {
     FileReader fr = new FileReader("txtFile.txt");
     BufferedReader br = new BufferedReader(fr);
     String text = "";
     String sz = null;
     while ((sz = br.readLine()) != null) {
         text = text.concat(sz);
     }
     String[] words = text.split(" ");
     String[] uniqueLabels;
     int count = 0;
     System.out.println(text);
     uniqueLabels = getLabels(words);

     for (String l: uniqueLabels) {
         if ("".equals(l) || null == l) {
             break;
         }
         for (String s: words) {
             if (l.equals(s)) {
                 count++;
             }
         }
         System.out.println("Word :: " + l + " Count :: " + count);
         count = 0;
     }
 }

And I used the following code to collect unique lbels(words) get if from link,

private static String[] getLabels(String[] keys) {
      String[] uniqueKeys = new String[keys.length];

      uniqueKeys[0] = keys[0];
      int uniqueKeyIndex = 1;
      boolean keyAlreadyExists = false;

      for (int i = 1; i < keys.length; i++) {
          for (int j = 0; j <= uniqueKeyIndex; j++) {
              if (keys[i].equals(uniqueKeys[j])) {
                  keyAlreadyExists = true;
              }
          }

          if (!keyAlreadyExists) {
              uniqueKeys[uniqueKeyIndex] = keys[i];
              uniqueKeyIndex++;
          }
          keyAlreadyExists = false;
      }
      return uniqueKeys;
  }

And this works fine, I want to collect top 10 ranked words depend on it's frequency in file.

Community
  • 1
  • 1
A J
  • 492
  • 1
  • 7
  • 24

3 Answers3

3

First of all, if you want it to run moderately fast, don't loop trough all the Strings in an array...use a HashMap... or even find some map for primitives.

Then go through the words. If the words is in the map, increment the value, otherwise put a 1. In the end, sort the map entries and fetch the first 10.

Not a total duplicate, but this answer pretty much shows how to get the counting done: Calculating frequency of each word in a sentence in java

Community
  • 1
  • 1
Silverclaw
  • 1,316
  • 2
  • 15
  • 28
2

I recommend using a Hashmap<String, Integer>() to count the word frequency. Hash uses key-value-pairs. That means the key is unique (your word) and the value variable. If you perform a put operation with a already existing key, the value will be updated.

Hashmap

Something like this should work:

hashmap.put(key, hashmap.get(key) + 1);

To get the top then words, I would implement sort the hashmap and retrieve the first ten entries.

Silverclaw
  • 1,316
  • 2
  • 15
  • 28
fridayswag
  • 359
  • 6
  • 12
  • Err...I just had another thought. Since Integer is a reference type, if the key does not exist, you will have null + 1. – Silverclaw Jul 29 '16 at 08:26
  • Try-catch-blocks are catching exceptions, not for handling something that will happen on a regular basis. Every time a new word is encountered, the JVM would have to create a stacktrace and handle an exception. Use containsKey instead. – Silverclaw Jul 29 '16 at 08:41
  • Actually I looked it up... it was an interesting read: [link](http://stackoverflow.com/questions/299068/how-slow-are-java-exceptions) – fridayswag Jul 29 '16 at 08:47
0

I solved it as,

public class wordFreq {
private static String[] w = null;
private static int[] r = null;
public static void main(String[] args){
    try {
        System.out.println("Enter 'n' value :: ");
        Scanner in = new Scanner(System.in);
        int n = in.nextInt();
        w = new String[n];
        r = new int[n];
        FileReader fr = new FileReader("acq.txt");
        BufferedReader br = new BufferedReader(fr);
        String text = "";
        String sz = null;
        while((sz=br.readLine())!=null){
            text = text.concat(sz);
        }
        String[] words = text.split(" ");
        String[] uniqueLabels;
        int count = 0;
        uniqueLabels = getUniqLabels(words);
        for(int j=0; j<n; j++){
                r[j] = 0;
            }
        for(String l: uniqueLabels)
        {
            if("".equals(l) || null == l)
            {
                break;
            }           
            for(String s : words)
            {
                if(l.equals(s))
                {
                    count++;
                }               
            }

            for(int i=0; i<n; i++){
                if(count>r[i]){
                    r[i] = count;
                    w[i] = l;
                    break;
                }
            }
            count=0;
        }
        display(n);
    } catch (Exception e) {
        System.err.println("ERR "+e.getMessage());
    }
}

public static void display(int n){
    for(int k=0; k<n; k++){
        System.out.println("Label :: "+w[k]+"\tCount :: "+r[k]);
    }
}

private static String[] getUniqLabels(String[] keys)
{
    String[] uniqueKeys = new String[keys.length];

    uniqueKeys[0] = keys[0];
    int uniqueKeyIndex = 1;
    boolean keyAlreadyExists = false;

    for(int i=1; i<keys.length ; i++)
    {
        for(int j=0; j<=uniqueKeyIndex; j++)
        {
            if(keys[i].equals(uniqueKeys[j]))
            {
                keyAlreadyExists = true;
            }
        }           

        if(!keyAlreadyExists)
        {
            uniqueKeys[uniqueKeyIndex] = keys[i];
            uniqueKeyIndex++;               
        }
        keyAlreadyExists = false;
    }       
    return uniqueKeys;
}

}

And the sample output is,

Enter 'n' value :: 
5
Label :: computer   Count :: 30
Label :: company    Count :: 22
Label :: express    Count :: 20
Label :: offer  Count :: 16
Label :: shearson   Count :: 16
A J
  • 492
  • 1
  • 7
  • 24