We are using CF 11 on Windows server 2012 R2. Having the structure of our data we need to use either one really large or about 500 small Solr collections. What are the pros and cons for doing it either way? Is there any guide/reference for the best practices about the number of collections? Any advice is greatly appreciated!
Asked
Active
Viewed 119 times
0
-
500 is probably OK-ish, but in general - you're probably better off by having one large collection and using an explicit [document routing key](https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud) to make sure similar documents end up on the same nodes (i.e. all the documents belong to a single user live on the same three nodes). – MatsLindh May 23 '17 at 18:57
-
Thank you. We are populating our collections from database with the queries - not documents. So the fields/records are all similar. I'm concerned that maintaining one large collection (updates/removes, etc.) might be tedious and require significant time/resources. Do you think it's the case? – E.Simsarian May 24 '17 at 11:55
-
Impossible to say. When I'm mentioning "documents", I'm talking about the lucene/solr concept. One row from your database will still be a single document. Not sure how you'd provide a routing key from DIH, but maybe the documentation can say. – MatsLindh May 24 '17 at 12:17
-
Personally, I would suggest start with 500 small Solr collections. This way when you have to refresh a few collections, then you can only refresh those selected few. Also, from my work experience, I can tell that it is always easier to maintain small chunks of code rather one page with all logic in it. The only way you can find out which approach is better is by trying it out. – ah7866 Jun 02 '17 at 14:26