0

I'm new to Lucene. Currently using 9.4.1 on ubuntu with pylucene running.

Whenever I look at the javadocs, very often I would see 2 kinds of analyzer. One is the base the other one is the baseFactory. For example, this page page list all the core analyzer. There are almost always 2 kind, e.g. LowerCaseFilter VS LowerCaseFilterFactory. I know they have different parameters, the base one takes TokenStream, the baseFactor takes a map.

What's this concept of factory? Looking at the parameter it's taking, it seems like the base (e.g. LowerCaseFilter) is suitable for building custom analyzer, whereas the baseFactory(e.g. LowerCaseFIlterFactory) is suitable to use directly for a string.

Could someone explain this to a noob?

user2773013
  • 3,102
  • 8
  • 38
  • 58
  • **The concept**: In software/programming, a factory class (or factory method) is a class (or method) whose job is to create an object for you, so that you don't need to create it yourself, explicitly, in your own code. Basically, you don't (in Java) write `new Foo()`, you use `FooFactory` to do that for you, instead. There are [certain advantages](https://stackoverflow.com/q/929021/12567365) to doing this. – andrewJames May 01 '23 at 23:14
  • **In Lucene**: I can't write a good answer because I never feel the need to use its tokenizer and filter factories. I'm generally OK using the actual classes: (Java): `Tokenizer source = new ICUTokenizer();`. In PyLucene it's even more succinct - just `source = ICUTokenizer`. I think a good answer would explain when it's a clear advantage to use them, over the base classes - and I can't really write a compelling answer (except to list the advantages given in the above link). – andrewJames May 01 '23 at 23:14
  • You will see factory classes used in [Solr](https://solr.apache.org/guide/6_6/charfilterfactories.html), for configuration tasks. – andrewJames May 01 '23 at 23:16

0 Answers0