i have running Lucene/Solr 4 for testing different features, also "clustering". Currently, 1 million documents are indexed. Every document has the following fields:
ID (unique Key) Example1: 10245
Example2: 24974
TOPIC (Keywords of the document) Example1: "disaster/japan/nuclear power station"
Example2: "world/japan/nuclear power"
HEADLINE (1 line of text): Example1: "explosion at nuclear power plant in japan"
Example2: "news about japans nuclear power plant"
TEXT (the full text): "In the Japanese nuclear power plant in Fukushima..."
All the fields are indexed and stored, exapt TEXT, which is only indexed, not stored. I use the following specific configuration:
<str name="carrot.title">TOPIC</str>
<str name="carrot.snippet">HEADLINE</str>
If you looking the example you see, that the TOPIC is different, but japan is the same. Is it possible to configure solr/carrot in that way, that example1 and example2 will be in one cluster? Because of the matching "japan"?!
Further there could be an 3rd TOPIC like "news/nuclear power", no "japan" inside but HEADLINE and TEXT are using the words: japans power plant. What solr/carrot configuration is relevant in order to receive those 3 news in one cluster?
Thank you!