I'm actually developing a system where you input some text files to a StandardAnalyzer, and the contents of that file are then replaced by the output of the StandardAnalyzer (which tokenizes and removes all the stop words). The code ive developed till now is :
File f = new File(path);
TokenStream stream = analyzer.tokenStream("contents",
new StringReader(readFileToString(f)));
CharTermAttribute charTermAttribute = stream.getAttribute(CharTermAttribute.class);
while (stream.incrementToken()) {
String term = charTermAttribute.toString();
System.out.print(term);
}
//Following is the readFileToString(File f) function
StringBuilder textBuilder = new StringBuilder();
String ls = System.getProperty("line.separator");
Scanner scanner = new Scanner(new FileInputStream(f));
while (scanner.hasNextLine()){
textBuilder.append(scanner.nextLine() + ls);
}
scanner.close();
return textBuilder.toString();
The readFileToString(f) is a simple function which converts the file contents to a string representation. The output i'm getting are the words each with the spaces or the new line between them removed. Is there a way to preserve the original spaces or the new line characters after the analyzer output, so that i can replace the original file contents with the filtered contents of the StandardAnalyzer and present it in a readable form?