27

I know variants of this question have been asked before (even by me), but I still don't understand a thing or two about this...

It was my understanding that one could retrieve more documents than the 128 default setting by doing this:

session.Advanced.MaxNumberOfRequestsPerSession = int.MaxValue;

And I've learned that a WHERE clause should be an ExpressionTree instead of a Func, so that it's treated as Queryable instead of Enumerable. So I thought this should work:

public static List<T> GetObjectList<T>(Expression<Func<T, bool>> whereClause)
{
    using (IDocumentSession session = GetRavenSession())
    {
        return session.Query<T>().Where(whereClause).ToList();                
    }
}

However, that only returns 128 documents. Why?

Note, here is the code that calls the above method:

RavenDataAccessComponent.GetObjectList<Ccm>(x => x.TimeStamp > lastReadTime);

If I add Take(n), then I can get as many documents as I like. For example, this returns 200 documents:

return session.Query<T>().Where(whereClause).Take(200).ToList();

Based on all of this, it would seem that the appropriate way to retrieve thousands of documents is to set MaxNumberOfRequestsPerSession and use Take() in the query. Is that right? If not, how should it be done?

For my app, I need to retrieve thousands of documents (that have very little data in them). We keep these documents in memory and used as the data source for charts.

** EDIT **

I tried using int.MaxValue in my Take():

return session.Query<T>().Where(whereClause).Take(int.MaxValue).ToList();

And that returns 1024. Argh. How do I get more than 1024?

** EDIT 2 - Sample document showing data **

{
  "Header_ID": 3525880,
  "Sub_ID": "120403261139",
  "TimeStamp": "2012-04-05T15:14:13.9870000",
  "Equipment_ID": "PBG11A-CCM",
  "AverageAbsorber1": "284.451",
  "AverageAbsorber2": "108.442",
  "AverageAbsorber3": "886.523",
  "AverageAbsorber4": "176.773"
}
Bob Horn
  • 33,387
  • 34
  • 113
  • 219

5 Answers5

37

It is worth noting that since version 2.5, RavenDB has an "unbounded results API" to allow streaming. The example from the docs shows how to use this:

var query = session.Query<User>("Users/ByActive").Where(x => x.Active);
using (var enumerator = session.Advanced.Stream(query))
{
    while (enumerator.MoveNext())
    {
        User activeUser = enumerator.Current.Document;
    }
}

There is support for standard RavenDB queries, Lucence queries and there is also async support.

The documentation can be found here. Ayende's introductory blog article can be found here.

Sean Kearon
  • 10,987
  • 13
  • 77
  • 93
  • 3
    Beware that when querying using the Streaming API, the index must already exist. If you run a query through the normal session API, and no matching index exists, a dynamic index will be created. But in the streaming API, the dynamic index is not created and the server complains that the index is not found. – Mike Schenk Feb 12 '14 at 18:42
  • Mike - that's interesting behaviour, it sounds like a bug. Have you discussed this in the RavenDB group? – Sean Kearon Feb 14 '14 at 07:41
  • 1
    You can use the `Stream(startsWith)` overload to get all docs in a specific collection; no need to use query if you don't need to perform a query. – kamranicus Dec 03 '14 at 22:32
  • Maybe nice to add is that you don't have to specify a collection and let it figure out by convention (`session.Advanced.Stream(documentStore.Conventions.GetTypeTagName(typeof(User)))`). Can be useful for repositories. – Caramiriel Mar 11 '16 at 15:25
  • Can we use `.Statistics(out stats)` in a streaming query? – Rudey Oct 27 '17 at 08:50
25

The Take(n) function will only give you up to 1024 by default. However, you can change this default in Raven.Server.exe.config:

<add key="Raven/MaxPageSize" value="5000"/>

For more info, see: http://ravendb.net/docs/intro/safe-by-default

Gert Arnold
  • 105,341
  • 31
  • 202
  • 291
Mike Christensen
  • 88,082
  • 50
  • 208
  • 326
  • Thanks, Mike. I think this will end up being the accepted answer, but I'd like to see if anyone else has a different angle on this first. – Bob Horn Apr 07 '12 at 01:00
16

The Take(n) function will only give you up to 1024 by default. However, you can use it in pair with Skip(n) to get all

        var points = new List<T>();
        var nextGroupOfPoints = new List<T>();
        const int ElementTakeCount = 1024;
        int i = 0;
        int skipResults = 0;

        do
        {
            nextGroupOfPoints = session.Query<T>().Statistics(out stats).Where(whereClause).Skip(i * ElementTakeCount + skipResults).Take(ElementTakeCount).ToList();
            i++;
            skipResults += stats.SkippedResults;

            points = points.Concat(nextGroupOfPoints).ToList();
        }
        while (nextGroupOfPoints.Count == ElementTakeCount);

        return points;

RavenDB Paging

Aleksey Cherenkov
  • 1,405
  • 21
  • 25
  • 1
    This method is by far the better approach. – Matt May 09 '13 at 16:03
  • 4
    Beware the limit on the number of server requests. As per Raven's "safe by default" settings, it will only make up to 30 round-trips to the server, so if the loop needs to execute more than that, it will fail because each iteration of the loop is another server request. – Mike Schenk Feb 12 '14 at 18:47
5

Number of request per session is a separate concept then number of documents retrieved per call. Sessions are short lived and are expected to have few calls issued over them.

If you are getting more then 10 of anything from the store (even less then default 128) for human consumption then something is wrong or your problem is requiring different thinking then truck load of documents coming from the data store.

RavenDB indexing is quite sophisticated. Good article about indexing here and facets here.

If you have need to perform data aggregation, create map/reduce index which results in aggregated data e.g.:

Index:

    from post in docs.Posts
    select new { post.Author, Count = 1 }

    from result in results
    group result by result.Author into g
    select new
    {
       Author = g.Key,
       Count = g.Sum(x=>x.Count)
    }

Query:

session.Query<AuthorPostStats>("Posts/ByUser/Count")(x=>x.Author)();
Petar Vučetin
  • 3,555
  • 2
  • 22
  • 31
  • 1
    So how would you solve this issue? The business wants to see a chart showing the last 24 hours worth of data points. Each document is a data point, and there are 10,000 of those over the last 24 hours. How do you chart that without bringing all the data over? – Bob Horn Apr 07 '12 at 01:03
  • I would think that you can achieve this by creating indices or [facets](http://ravendb.net/docs/client-api/faceted-search) – Petar Vučetin Apr 07 '12 at 01:08
  • I just noticed "each document is a data point" - can you show an example of this document? – Petar Vučetin Apr 07 '12 at 01:34
  • It's actually more accurate to say that each document contains one or more data points for one or more charts. I edited my question above to show a sample document with its data points. Also, thank you for your help. I'm not familiar with facets. I'll have to check it out. Finally, I don't think aggregated data will work because the chart plots each data point individually. – Bob Horn Apr 07 '12 at 01:54
  • I have not seen the chart with 10,000 data points that makes any sense if viewed by humans (but stranger things exist). One idea would be to reduce granularity. You could create a set of documents that represent an aggregate data over period of less then 24hours e.g. snapshot ever hour. If business wants to dig deeper then well open all the faucets and bring the trucks :) – Petar Vučetin Apr 07 '12 at 21:44
  • I agree with you. And I think your snapshot idea is a good one. However, I'm only involved with this project on the RavenDB side of things. The decision to process this way was made when version 1 was done. While I/we may be able to change the architecture, we also may not be able to do so. Thanks! – Bob Horn Apr 07 '12 at 21:57
1

You can also use a predefined index with the Stream method. You may use a Where clause on indexed fields.

var query = session.Query<User, MyUserIndex>();
var query = session.Query<User, MyUserIndex>().Where(x => !x.IsDeleted);

using (var enumerator = session.Advanced.Stream<User>(query))
{
    while (enumerator.MoveNext())
    {
        var user = enumerator.Current.Document;
        // do something
    }
}

Example index:

public class MyUserIndex: AbstractIndexCreationTask<User>
{
    public MyUserIndex()
    {
        this.Map = users =>
            from u in users
            select new
            {
                u.IsDeleted,
                u.Username,
            };
    }
}

Documentation: What are indexes? Session : Querying : How to stream query results?


Important note: the Stream method will NOT track objects. If you change objects obtained from this method, SaveChanges() will not be aware of any change.


Other note: you may get the following exception if you do not specify the index to use.

InvalidOperationException: StreamQuery does not support querying dynamic indexes. It is designed to be used with large data-sets and is unlikely to return all data-set after 15 sec of indexing, like Query() does.

SandRock
  • 5,276
  • 3
  • 30
  • 49