How can I index documents by applying StandardTokenizer
, LowerCaseFilter
and EdgeNgramFilter
using lucene 5.2.0
?
Asked
Active
Viewed 50 times
1 Answers
1
Try this
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="15" side="front" />
</analyzer>
</fieldType>
with java
public TokenStream tokenStream(String fieldName,
Reader reader) {
TokenStream result = new StandardTokenizer(reader);
result = new StandardFilter(result);
result = new LowerCaseFilter(result);
result = new EdgeNGramTokenFilter(result, Side.FRONT,1,20);
return result;
}
check this link
-
Thanks but I need in lucene not in solr. I need example in java code using lucene – iNikkz Apr 27 '16 at 11:02
-
okay. below mentioned link might be helpful. Analyzer analyzer = new Analyzer() { @Override protected TokenStreamComponents createComponents(String fieldName, Reader reader) { Tokenizer source = new StandardTokenizer(VERSION,reader); TokenStream filter = new LowerCaseFilter(VERSION,source); return new TokenStreamComponents(source, filter); } }; [https://lingpipe-blog.com/2014/03/08/lucene-4-essentials-for-text-search-and-indexing/] – Vinod Apr 27 '16 at 11:16