How to get word count from a text file/folder in java (without changing the read order from the folder)

Question

In my below code it reads .txt files from a folder (say the folder has 2000+ text files) and displays the total number of words present in a text document.

If I read 10-30 text files only from the directory the output is displaying correctly in an order for each text files.

But when I add 2000+ text files and read at once from that folder the output arrangement is collapsed.(it displays in random order).

can anyone suggest me to solve this?

import java.io.BufferedReader;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FilenameFilter;
import java.io.IOException;
import java.io.InputStreamReader;
import java.io.StringReader;
import java.util.*;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

import org.apache.commons.io.FileUtils;

public class duplicatestrings
{
public static void main(String[] args) 
{
    FilenameFilter filter = new FilenameFilter() {
        public boolean accept(File dir, String name) {
            return name.endsWith(".txt");
        }
    };

    File folder = new File("E:\\testfolder");
    File[] listOfFiles = folder.listFiles(filter);

    for (int i = 0; i < listOfFiles.length; i++) {
        File file1 = listOfFiles[i];
        try {
            String content = FileUtils.readFileToString(file1);
             // System.out.println("asssdffsssssssssss = " + content);
        } catch (IOException e) {

            e.printStackTrace();
        }

        BufferedReader ins = null;
        try {
            ins = new BufferedReader (
                    new InputStreamReader(
                        new FileInputStream(file1)));
        } catch (FileNotFoundException e) {

            e.printStackTrace();
        }

        String line = "", str = "";

        int a = 0;
        int b = 0;
        try {
            while ((line = ins.readLine()) != null) {
            str += line + " ";
            b++;
            }
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
     //   System.out.println("Total number of lines " +b);

     //System.out.println(str);

    /*    int count =0;
        try {
            String input = ins.readLine();
            String[] array = input.split(" ");
            System.out.print("\nPlease enter word to be counted :");
            String key = ins.readLine();
            for(int s=0;i < array.length;i++){
                if(array[s].equals(key))
                    count++;
            }
            System.out.print("\n The given word occured " + count + " times");
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }*/





        StringTokenizer st = new StringTokenizer(str);
        while (st.hasMoreTokens()) {
        String s = st.nextToken();
        a++;

        }

 // List<String> list = Arrays.asList(str.split(" "));

      //  Set<String> uniqueWords = new HashSet<String>(list);
       // for (String word : uniqueWords) {
        //    System.out.println(word + a+ "\n"  + Collections.frequency(list, word));}
           System.out.println(" Total no of words=" + a );


    }
        }
      }

And I have to get distinct and repeated word "no of counts(only)" from all text files/folder(directory).

suggestions welcomed.

Please express what you want in the output in more details. Do you want the word-count for each of the `.txt` files separately? Also what is this order you are talking about? Is that the alphabetical order of the files by their filename when you see them windows explorer? — STaefi, Jan 18 '16 at 07:23
Yes I want word count separately for each .txt files.the output is randomly displayed.Some text files are arranged in random order. — Ram Ki, Jan 18 '16 at 07:31
if i view (file1) it has only 1000 files but in my directory folder i am having 2000+files. but the output "total no of words " is giving count for all 2000+files. — Ram Ki, Jan 18 '16 at 07:35
You didn't answer my question about the order you want. In what order? Do you mean the alphabetical order of the filename or something else? When you say: "the output is randomly displayed", you compare the output order with what? — STaefi, Jan 18 '16 at 07:35
something unrelated to your question, why are you reading file using this `String content = FileUtils.readFileToString(file1);` then re-read the file with `BufferedReader`, you are not using variable `content` in the rest of the code — Yazan, Jan 18 '16 at 07:38
also what do you mean by `the output arrangement is collapsed.(it displays in random order)` what i see your output have only total without file name, so how can you know it's not correct? keep in mind that `folder.listFiles(filter)` may not lsit the files the same order you see them in `File Explorer` of the OS. — Yazan, Jan 18 '16 at 07:46
I want the alphabetical order as displayed in file explorer.I cross checked it — Ram Ki, Jan 18 '16 at 07:59
if you want alphabetical order, then you have to sort the `listOfFiles` alphabetically, start with printing file name with the count, and see what you get, then sort the array if needed — Yazan, Jan 18 '16 at 08:02

score 0 · Answer 1 · edited May 23 '17 at 12:00

0

After you count the words in each file you can insert the results into a TreeSet, then you can display them in a consistant order. The key is the filename, the value is the word count. See:how to sort Map values by key in Java

Or you can sort the filenames in the folder, and count the words in the sorted file list: how to File.listFiles in alphabetical order?

edited May 23 '17 at 12:00

Community

1
1

answered Jan 18 '16 at 07:41

Gavriel

18,880
12
68
105

score 0 · Answer 2 · edited Jun 20 '20 at 09:12

I guess Logic below will help you, add File reading code to it and replace "test" variable with each line from file.

To count total words Or to count total words without repeat word count

   public static void main(String[] args) {
    String test = "I am trying to make make make";
    Pattern p = Pattern.compile("\\w+");
    Matcher m = p.matcher(test);
    HashSet<String> hs =  new HashSet<>();
    int i=0;
    while (m.find()) {
        i++;
        hs.add(m.group());
    }
    System.out.println("Total words Count==" + i);
    System.out.println("Count without Repetation ==" + hs.size());
    }

Output :

Total words Count==7

Count without Repeatation ==5

Hope this helps :)

How to get word count from a text file/folder in java (without changing the read order from the folder)

2 Answers2