I am trying to compare the contents of documents using solr
. I do this by simply using the entire document contents as a query
. This works until the documents get large. A document can contain as many as 15k words or more. This results in a max boolean
clause exception which has a default value of 1024. Now I could of course increase this value, but even if I increase it to 5k then it will remain impossible to compare documents with large contents.
Is Lucene
even suitable for such tasks? And if so, what should I do to accomplish said requirements. If not, what would be an alternative way of comparing the contents of one document with other documents?