29

I executed some query like "Address:Jack*". It show numFound = 5214 and display 100 documents in results page(I changed default display results from 10 to 100).

How can I get all documents.

Abhijit Bashetti
  • 8,518
  • 7
  • 35
  • 47
SENTHIL SARAVANAN
  • 719
  • 1
  • 12
  • 28

8 Answers8

39

I remember myself doing &rows=2147483647

2,147,483,647 is integer's maximum value. I recall using a number bigger than that once and having a NumberFormatException because it couldn't be parsed into an int. I don't know if they use Long nowadays, but 2 billion rows is normally more than enough.

Small note:
Be careful if you are planning to do this in production. If you do a query like * : * and your index is big, you could transferring a couple of gigabytes in that query.
If you know you won't have many docs, go ahead and use integer's max value.

On the other hand, if you are doing a one-time script and just need to dump all results (for example document ID's) then this approach is valid, if you don't mind waiting 3-5 minutes for a query to return.

Fermin Silva
  • 3,331
  • 2
  • 17
  • 21
  • 10
    Don't use Integer.MAX_VALUE(2147483647) as value of rows in production. This will heavily slow down your query even if you have a small resultset, because solr preallocates a queue in this size. see https://issues.apache.org/jira/browse/SOLR-7580 – Simulant Oct 13 '16 at 09:03
  • 3
    Dangerous. Do this only for a small amount of documents. – freedev Apr 27 '17 at 09:14
  • I think that in some cases it's useful to check if part of the index is consistent with the respecitve part of the indexed database. Specially, if the index is generated by the application and the original data is stored in an SQL database. – João Pedro Schmitt Jul 19 '19 at 11:17
7

Don't use &rows=2147483647

Don't use Integer.MAX_VALUE(2147483647) as value of rows in production. This will heavily slow down your query even if you have a small resultset, because solr preallocates a queue in this size. see https://issues.apache.org/jira/browse/SOLR-7580

I strongly suggest to use Exporting Result Sets

It’s possible to export fully sorted result sets using a special rank query parser and response writer specifically designed to work together to handle scenarios that involve sorting and exporting millions of records.

Or I suggest to use Deep Paging.

Simple Pagination is a easy thing when you have few documents to read and all you have to do is play with start and rows parameters. But this is not a feasible way when you have many documents, I mean hundreds of thousands or even millions.
This is the kind of thing that could bring your Solr server to their knees.

For typical applications displaying search results to a human user, this tends to not be much of an issue since most users don’t care about drilling down past the first handful of pages of search results — but for automated systems that want to crunch data about all of the documents matching a query, it can be seriously prohibitive.

This means that if you have a website and are paging search results, a real user do not go so further but consider on the other hand what can happen if a spider or a scraper try to read all the website pages.

Now we are talking of Deep Paging.

I’ll suggest to read this amazing post:

https://lucidworks.com/post/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/

And take a look at this document page:

https://solr.apache.org/guide/pagination-of-results.html

And here is an example that try to explain how to paginate using the cursors.

SolrQuery solrQuery = new SolrQuery();
solrQuery.setRows(500);
solrQuery.setQuery("*:*");
solrQuery.addSort("id", ORDER.asc);  // Pay attention to this line
String cursorMark = CursorMarkParams.CURSOR_MARK_START;
boolean done = false;
while (!done) {
    solrQuery.set(CursorMarkParams.CURSOR_MARK_PARAM, cursorMark);
    QueryResponse rsp = solrClient.query(solrQuery);
    String nextCursorMark = rsp.getNextCursorMark();
    for (SolrDocument d : rsp.getResults()) {
            ... 
    }
    if (cursorMark.equals(nextCursorMark)) {
        done = true;
    }
    cursorMark = nextCursorMark;
}
freedev
  • 25,946
  • 8
  • 108
  • 125
6

Returning all the results is never a good option as It would be very slow in performance.
Can you mention your use case ?

Also, Solr rows parameter helps you to tune the number of the results to be returned.
However, I don't think there is a way to tune rows to return all results. It doesn't take a -1 as value.
So you would need to set a high value for all the results to be returned.

Jayendra
  • 52,349
  • 4
  • 80
  • 90
  • Our experience (and consult received) was the same: using Solr as a filter-and-return-all-results system is far from optimal. It just wasn't designed for returning all results. We do however wish there was some way to get at least all matching "keys" (key-field values). See this [similar question](http://stackoverflow.com/questions/16280837/solr-query-get-results-without-scanning-files) – Yonatan Apr 29 '13 at 15:51
3

What you should do is to first create a SolrQuery shown below and set the number of documents you want to fetch in a batch.

int lastResult=0; //this is for processing the future batch

String query = "id:[ lastResult TO *]"; // just considering id for the sake of simplicity

SolrQuery solrQuery = new SolrQuery(query).setRows(500); //setRows will set the required batch, you can change this to whatever size you want.

SolrDocumentList results = solrClient.query(solrQuery).getResults(); //execute this statement

Here I am considering an example of search by id, you can replace it with any of your parameter to search upon.

The "lastResult" is the variable you can change after execution of the first 500 records(500 is the batch size) and set it to the last id got from the results.

This will help you execute the next batch starting with last result from previous batch.

Hope this helps. Shoot up a comment below if you need any clarification.

Apurv Nerlekar
  • 2,310
  • 1
  • 21
  • 29
0

For selecting all documents in dismax/edismax via Solarium php client, the normal query syntax : does not work. To select all documents set the default query value in solarium query to empty string. This is required as the default query in Solarium is :. Also set the alternative query to :. Dismax/eDismax normal query syntax does not support :, but the alternative query syntax does.

For more details following book can be referred

http://www.packtpub.com/apache-solr-php-integration/book

jayant
  • 1
0

As the other answers pointed out, you can configure the rows to be max integer to yield back all the results for a query. I would recommend though to use Solr feature of pagination, and build a function that will return for you all the results using the cursorMark API. The gist of it is you set the cursorMark parameter to '*', you set the page size(rows parameter), and on each result you'll get a cursorMark for the next page, so you execute the same query only with the cursorMark given from the last result. This way you'll have more flexibility on how much of the results you want back, in a much more performant way.

itayad
  • 289
  • 2
  • 10
0

The way I dealt with the problem is by running the query twice:

// Start with your (usually small) default page size
solrQuery.setRows(50); 
QueryResponse response = solrResponse(query);
if (response.getResults().getNumFound() > 50) {
    solrQuery.setRows(response.getResults().getNumFound()); 
    response = solrResponse(query);
}

It makes a call twice to Solr, but gets you all matching records....with the small performance penalty.

Ya.Ma
  • 21
  • 6
-3

query.setRows(Integer.MAX_VALUE); works for me!!

atpatil11
  • 438
  • 4
  • 13
  • how you got using this statement. It gets only 10 even after using this statement. – Divyang Shah Feb 18 '15 at 10:16
  • 2
    Carefull with this. I was using it on a very specific case where the actual number was limited by the client app. When I deployed on the production server, I got a java.lang.NegativeArraySizeException because of this. – s1m3n Jun 21 '15 at 09:52