Unable to read words from txt file and count the number of words

Question

I have a small project to code a twitter crawler and I have encounter some issues when analyzing the tweets collected.

The tweets collected is place into a txt file. What I wanna achieve is to count how many words are there in the txt file, number of words that contain the word 'engineering' and number of hashtags. Below is what I have tried so far,

import java.io.*;
import java.util.StringTokenizer;

public class TwitterAnalyzer {

public static void main(String args[]){
    try{

        String keyword = "Engineering";
        FileInputStream fInstream = new FileInputStream("C:\\Users\\Alan\\Documents\\NetBeansProjects\\TwitterCrawler\\"+keyword+"-data.txt");
        DataInputStream in = new DataInputStream(fInstream);
        BufferedReader br = new BufferedReader(new InputStreamReader(in));
        String strLine;


        int numberOfKeywords = 0;
        int numberOfWords = 0;
        int numberOfHashtags = 0;

        while((strLine = br.readLine()) != null){

            strLine = br.readLine();
            System.out.println(strLine);
            StringTokenizer st = new StringTokenizer(strLine, " \t\n\r\f.,;:!?\"");
            while(st.hasMoreTokens()){
                String word = st.nextToken();
                numberOfWords++;
                if(word.contains(keyword)){
                    numberOfKeywords++;
                }
                if(word.contains("#")){
                    numberOfHashtags++;
                }
            }
        }



        System.out.println(numberOfWords);
        System.out.println(numberOfKeywords);
        System.out.println(numberOfHashtags);
        br.close();

    }catch (FileNotFoundException fe){
        fe.printStackTrace();
        System.out.println("Unable to locate file");
        System.exit(-1);
    }catch (IOException ie){
        ie.printStackTrace();
        System.out.println("Unable to read file");
        System.exit(-1);
    }        


}
}

Here is the link to the txt file.

Any here is greatly appreciated!

`while((strLine = br.readLine()) != null){ strLine = br.readLine();` you invoke readLine() twice for each iteration. — Natalia, Oct 16 '15 at 12:26
Whats `unable to read words`? Any specific error messages or unexpected result? Also, a `map` would be a better choice if you are looking for occurrences of individual words — sam, Oct 16 '15 at 12:27

score 1 · Answer 1 · answered Oct 16 '15 at 12:32

1

Try this way it will help

import java.io.BufferedReader;
import java.io.FileReader;

public class CountWords {

    public static void main (String args[]) throws Exception {

       System.out.println ("Engineering");       
       FileReader fr = new FileReader ("c:\\Customer1.txt");        
       BufferedReader br = new BufferedReader (fr);     
       String line = br.readLin ();
       int count = 0;
       while (line != null) {
          String []parts = line.split(" ");
          for( String w : parts)
          {
            count++;        
          }
          line = br.readLine();
       }         
       System.out.println(count);
    }
}

answered Oct 16 '15 at 12:32

Aditya Vyas-Lakhan

13,409
16
61
96

Hi Lakhan, thanks! It worked. Can you also show me how to check if an individual word contains "engineering" and "#"? – Alan1 Oct 16 '15 at 12:45
@Alan1 glad it helped see here http://stackoverflow.com/questions/17134773/to-check-if-string-contains-particular-word – Aditya Vyas-Lakhan Oct 16 '15 at 12:49

score 1 · Accepted Answer · answered Oct 16 '15 at 12:45

the following code returns: 202, 14, 22

public static void main(String args[]){
    try{
        String keyword = "engineering";
        Pattern keywordPattern = Pattern.compile(keyword);

        Pattern hashTagPattern = Pattern.compile("#[a-zA-Z0-9_]");

        FileInputStream fInstream = new FileInputStream("E:\\t.txt");
        BufferedReader br = new BufferedReader(new InputStreamReader(fInstream));
        String strLine;


        int numberOfKeywords = 0;
        int numberOfWords = 0;
        int numberOfHashtags = 0;

        while((strLine = br.readLine()) != null){
            Matcher  matcher = keywordPattern.matcher(strLine.toLowerCase());
            while (matcher.find())
                numberOfKeywords++;
            numberOfWords += strLine.split("\\s").length;
            matcher = hashTagPattern.matcher(strLine);
            while (matcher.find())
                numberOfHashtags++;
        }

        System.out.println(numberOfWords);
        System.out.println(numberOfKeywords);
        System.out.println(numberOfHashtags);
        br.close();

    }catch (FileNotFoundException fe){
        fe.printStackTrace();
        System.out.println("Unable to locate file");
        System.exit(-1);
    }catch (IOException ie){
        ie.printStackTrace();
        System.out.println("Unable to read file");
        System.exit(-1);
    }
}

Hi Sayed, thank you so much for you help! You saved my day! But, can you explain what is the difference between FileReader and FileInputStream? — Alan1, Oct 16 '15 at 15:41
Alan1, as defined in Oracle Java docs FileReader is meant for reading streams of characters. For reading streams of raw bytes, consider using a FileInputStream. I hope this also help you: http://stackoverflow.com/questions/20927278/filereader-advantages-vs-fileinputstream-advantages#20927429 — Ahmed Sayed, Oct 16 '15 at 16:12

Unable to read words from txt file and count the number of words

2 Answers2