0

I have a small project to code a twitter crawler and I have encounter some issues when analyzing the tweets collected.

The tweets collected is place into a txt file. What I wanna achieve is to count how many words are there in the txt file, number of words that contain the word 'engineering' and number of hashtags. Below is what I have tried so far,

import java.io.*;
import java.util.StringTokenizer;

public class TwitterAnalyzer {

public static void main(String args[]){
    try{

        String keyword = "Engineering";
        FileInputStream fInstream = new FileInputStream("C:\\Users\\Alan\\Documents\\NetBeansProjects\\TwitterCrawler\\"+keyword+"-data.txt");
        DataInputStream in = new DataInputStream(fInstream);
        BufferedReader br = new BufferedReader(new InputStreamReader(in));
        String strLine;


        int numberOfKeywords = 0;
        int numberOfWords = 0;
        int numberOfHashtags = 0;

        while((strLine = br.readLine()) != null){

            strLine = br.readLine();
            System.out.println(strLine);
            StringTokenizer st = new StringTokenizer(strLine, " \t\n\r\f.,;:!?\"");
            while(st.hasMoreTokens()){
                String word = st.nextToken();
                numberOfWords++;
                if(word.contains(keyword)){
                    numberOfKeywords++;
                }
                if(word.contains("#")){
                    numberOfHashtags++;
                }
            }
        }



        System.out.println(numberOfWords);
        System.out.println(numberOfKeywords);
        System.out.println(numberOfHashtags);
        br.close();

    }catch (FileNotFoundException fe){
        fe.printStackTrace();
        System.out.println("Unable to locate file");
        System.exit(-1);
    }catch (IOException ie){
        ie.printStackTrace();
        System.out.println("Unable to read file");
        System.exit(-1);
    }        


}
}

Here is the link to the txt file.

Any here is greatly appreciated!

Aditya Vyas-Lakhan
  • 13,409
  • 16
  • 61
  • 96
Alan1
  • 83
  • 1
  • 4
  • 2
    `while((strLine = br.readLine()) != null){ strLine = br.readLine();` you invoke readLine() twice for each iteration. – Natalia Oct 16 '15 at 12:26
  • Whats `unable to read words`? Any specific error messages or unexpected result? Also, a `map` would be a better choice if you are looking for occurrences of individual words – sam Oct 16 '15 at 12:27

2 Answers2

1

Try this way it will help

import java.io.BufferedReader;
import java.io.FileReader;

public class CountWords {

    public static void main (String args[]) throws Exception {

       System.out.println ("Engineering");       
       FileReader fr = new FileReader ("c:\\Customer1.txt");        
       BufferedReader br = new BufferedReader (fr);     
       String line = br.readLin ();
       int count = 0;
       while (line != null) {
          String []parts = line.split(" ");
          for( String w : parts)
          {
            count++;        
          }
          line = br.readLine();
       }         
       System.out.println(count);
    }
}
Aditya Vyas-Lakhan
  • 13,409
  • 16
  • 61
  • 96
  • Hi Lakhan, thanks! It worked. Can you also show me how to check if an individual word contains "engineering" and "#"? – Alan1 Oct 16 '15 at 12:45
  • @Alan1 glad it helped see here http://stackoverflow.com/questions/17134773/to-check-if-string-contains-particular-word – Aditya Vyas-Lakhan Oct 16 '15 at 12:49
1

the following code returns: 202, 14, 22

public static void main(String args[]){
    try{
        String keyword = "engineering";
        Pattern keywordPattern = Pattern.compile(keyword);

        Pattern hashTagPattern = Pattern.compile("#[a-zA-Z0-9_]");

        FileInputStream fInstream = new FileInputStream("E:\\t.txt");
        BufferedReader br = new BufferedReader(new InputStreamReader(fInstream));
        String strLine;


        int numberOfKeywords = 0;
        int numberOfWords = 0;
        int numberOfHashtags = 0;

        while((strLine = br.readLine()) != null){
            Matcher  matcher = keywordPattern.matcher(strLine.toLowerCase());
            while (matcher.find())
                numberOfKeywords++;
            numberOfWords += strLine.split("\\s").length;
            matcher = hashTagPattern.matcher(strLine);
            while (matcher.find())
                numberOfHashtags++;
        }

        System.out.println(numberOfWords);
        System.out.println(numberOfKeywords);
        System.out.println(numberOfHashtags);
        br.close();

    }catch (FileNotFoundException fe){
        fe.printStackTrace();
        System.out.println("Unable to locate file");
        System.exit(-1);
    }catch (IOException ie){
        ie.printStackTrace();
        System.out.println("Unable to read file");
        System.exit(-1);
    }
}
Ahmed Sayed
  • 452
  • 1
  • 8
  • 15
  • Hi Sayed, thank you so much for you help! You saved my day! But, can you explain what is the difference between FileReader and FileInputStream? – Alan1 Oct 16 '15 at 15:41
  • Alan1, as defined in Oracle Java docs FileReader is meant for reading streams of characters. For reading streams of raw bytes, consider using a FileInputStream. I hope this also help you: http://stackoverflow.com/questions/20927278/filereader-advantages-vs-fileinputstream-advantages#20927429 – Ahmed Sayed Oct 16 '15 at 16:12