I am processing an entire solr index of 80million documents, and I am doing so through pagination.
I learned from here that it is a bad idea to use the parameters start
for pagination on very large index like this, instead, I should use cursor marker using code like below:
query.setSort("id", SolrQuery.ORDER.asc);
while (! done) {
q.set(CursorMarkParams.CURSOR_MARK_PARAM, cursorMark);
QueryResponse rsp = solrServer.query(q);
String nextCursorMark = rsp.getNextCursorMark();
boolean hadEnough = doCustomProcessingOfResults(rsp);
if (hadEnough || cursorMark.equals(nextCursorMark)) {
done = true;
}
cursorMark = nextCursorMark;
}
However, this requires the query to firstly sort the entire index on the uniqueKey field, which is defined as :
<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
, the operation of which requires a lot of memory and my computer does not have sufficient memory to deal with that. It generates an 'outofmemory' error.
I wonder if there is any workaround for this? Many thanks in advance.