I have a small project to code a twitter crawler and I have encounter some issues when analyzing the tweets collected.
The tweets collected is place into a txt file. What I wanna achieve is to count how many words are there in the txt file, number of words that contain the word 'engineering' and number of hashtags. Below is what I have tried so far,
import java.io.*;
import java.util.StringTokenizer;
public class TwitterAnalyzer {
public static void main(String args[]){
try{
String keyword = "Engineering";
FileInputStream fInstream = new FileInputStream("C:\\Users\\Alan\\Documents\\NetBeansProjects\\TwitterCrawler\\"+keyword+"-data.txt");
DataInputStream in = new DataInputStream(fInstream);
BufferedReader br = new BufferedReader(new InputStreamReader(in));
String strLine;
int numberOfKeywords = 0;
int numberOfWords = 0;
int numberOfHashtags = 0;
while((strLine = br.readLine()) != null){
strLine = br.readLine();
System.out.println(strLine);
StringTokenizer st = new StringTokenizer(strLine, " \t\n\r\f.,;:!?\"");
while(st.hasMoreTokens()){
String word = st.nextToken();
numberOfWords++;
if(word.contains(keyword)){
numberOfKeywords++;
}
if(word.contains("#")){
numberOfHashtags++;
}
}
}
System.out.println(numberOfWords);
System.out.println(numberOfKeywords);
System.out.println(numberOfHashtags);
br.close();
}catch (FileNotFoundException fe){
fe.printStackTrace();
System.out.println("Unable to locate file");
System.exit(-1);
}catch (IOException ie){
ie.printStackTrace();
System.out.println("Unable to read file");
System.exit(-1);
}
}
}
Here is the link to the txt file.
Any here is greatly appreciated!