-1

How can I index documents by applying StandardTokenizer, LowerCaseFilter and EdgeNgramFilter using lucene 5.2.0 ?

itzmebibin
  • 9,199
  • 8
  • 48
  • 62
iNikkz
  • 3,729
  • 5
  • 29
  • 59

1 Answers1

1

Try this

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
          <analyzer type="index">
            <tokenizer class="solr.StandardTokenizerFactory"/>
            <filter class="solr.LowerCaseFilterFactory"/>
            <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="15" side="front" />
        </analyzer>
     </fieldType>

with java

public TokenStream tokenStream(String fieldName,
                    Reader reader) {
                TokenStream result = new StandardTokenizer(reader);

                result = new StandardFilter(result);
                result = new LowerCaseFilter(result);
                result = new EdgeNGramTokenFilter(result, Side.FRONT,1,20);
                return result;
            }

check this link

Community
  • 1
  • 1
Vinod
  • 1,965
  • 1
  • 9
  • 18
  • Thanks but I need in lucene not in solr. I need example in java code using lucene – iNikkz Apr 27 '16 at 11:02
  • okay. below mentioned link might be helpful. Analyzer analyzer = new Analyzer() { @Override protected TokenStreamComponents createComponents(String fieldName, Reader reader) { Tokenizer source = new StandardTokenizer(VERSION,reader); TokenStream filter = new LowerCaseFilter(VERSION,source); return new TokenStreamComponents(source, filter); } }; [https://lingpipe-blog.com/2014/03/08/lucene-4-essentials-for-text-search-and-indexing/] – Vinod Apr 27 '16 at 11:16