Would like to tokenize strings based on . , ; etc however would like to preserve email addresses, ip addresses and the likes. How do i use an analyzer with lucence to do this task? The following code which i found on stackoverflow does not preserve emails. Any pointers to documentation on how to use the pattern specification feature of StandardAnalyzer of lucene will also be helpful. Thanks much
String text
= "Lucene is simple yet powerful java based search library. sitaraman@dataguise.com";
Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);
TokenStream tokenStream = analyzer.tokenStream(
LuceneConstants.CONTENTS, new StringReader(text));
TermAttribute term = tokenStream.addAttribute(TermAttribute.class);
while(tokenStream.incrementToken()) {
System.out.print("[" + term.term() + "] ");