3

In RavenDB 4 (v4.0.3-patch-40031) I have two Document types: Apple and Orange. Both have similar, but also distinct, properties. I run into a bug in my code at runtime where sometimes an the ID of an Apple is provided, but an Orange is returned. Scary!

Diving into it, it somewhat makes sense. But I'm struggling with an appropriate solution.

Here goes. In RavenDB, I have stored a single Apple as a Document:

id: "078ff39b-da50-4405-9615-86b0d185ba17"
{
    "Name": "Elstar",
    "@metadata": {
        "@collection": "Apples",
        "Raven-Clr-Type": "FruitTest.Apple, FruitTest"
    }
}

Assume for the sake of this example that I have no Orange documents stored in the database. I would expect this test to succeed:

// arrange - use the ID of an apple, which does not exist in Orange collection
var id_of_apple = "078ff39b-da50-4405-9615-86b0d185ba17";

// act - load an Orange
var target = await _session.LoadAsync<Orange>("078ff39b-da50-4405-9615-86b0d185ba17");

// assert - should be null, because there is no Orange with that Id
target.Should().BeNull(because: "provided ID is not of an Orange but of an Apple");

... but it fails. What happens is that the Document ID exists, so the RavenDB loads the document. Not caring what type it is. And it attempts to map the properties automatically. I expected, or assumed incorrectly, that the Load type specifier would limit the lookup to that particular document collection. Instead, it grabs + maps it throughout the entire database, not constraining it to type <T>. So the behaviour is different from .Query<T>, which does constraint to collection.

Important to note is that I'm using guids as identity strategy, by setting the Id to string.Empty (conform the docs). I assume the default ID strategy, which is like entityname/1001, would not have this issue.

The docs on Loading Entities don't really mention if this is intentional, or not. It only says: "download documents from a database and convert them to entities.".

However, for reasons, I do want to constrain the Load operation to a single collection. Or, better put, as efficiently as possible load a document by ID, from a specific collection. And if it does not exist, return null.

AFAIK, there are two options to achieve this:

  1. Use the more expensive .Query<T>.Where(x => x.Id == id), instead of .Load<T>(id)
  2. Do the .Load<T>(id) first and then check (~somehow, see bottom) if it is part of collection T

My problem can be summarized in two questions:

  1. Is there another, more performant or stable way, than the two options mentioned above?
  2. If there is not, out of the two options - which is recommended in terms of performance and stability?

Especially for the second question, it is very hard to correctly measure this properly. As for stability, e.g. not having side effects, that is something that I guess someone with more in-depth knowledge or experience of the RavenDB internals might shed some light on.

N.B. The question assumes that the explained behaviour is intentional and not a RavenDB bug.

~Somehow would be:

public async Task<T> Get(string id)
{
    var instance = await _session.LoadAsync<T>(id);
    if (instance == null) return null;

    // the "somehow" check for collection
    var expectedTypeName = string.Concat(typeof(T).Name, "s");
    var actualTypeName = _session.Advanced.GetMetadataFor(instance)[Constants.Documents.Metadata.Collection].ToString();
    if (actualTypeName != expectedTypeName)
    {
        // Edge case: Apple != Orange
        return null;
    }

    return instance;
}

How to reproduce

UPDATE 2018/04/19 - Added this reproducible sample after helpful comments (thanks for that).

Models

public interface IFruit
{
    string Id { get; set; }
    string Name { get; set; }
}

public class Apple : IFruit
{
    public string Id { get; set; }
    public string Name { get; set; }
}

public class Orange : IFruit
{
    public string Id { get; set; }
    public string Name { get; set; }
}

Tests
E.g. throws InvalidCastException in same session (works), but in second it doesn't.

public class UnitTest1
{
    [Fact]
    public async Task SameSession_Works_And_Throws_InvalidCastException()
    {
        var store = new DocumentStore()
        {
            Urls = new[] {"http://192.168.99.100:32772"},
            Database = "fruit"
        }.Initialize();

        using (var session = store.OpenAsyncSession())
        {
            var apple = new Apple
            {
                Id = Guid.NewGuid().ToString(),
                Name = "Elstar"
            };

            await session.StoreAsync(apple);
            await session.SaveChangesAsync();

            await Assert.ThrowsAsync<InvalidCastException>(() => session.LoadAsync<Orange>(apple.Id));
        }
    }

    [Fact]
    public async Task Different_Session_Fails()
    {
        var store = new DocumentStore()
        {
            Urls = new[] {"http://192.168.99.100:32772"},
            Database = "fruit"
        }.Initialize();

        using (var session = store.OpenAsyncSession())
        {
            var appleId = "ca5d9fd0-475b-41de-a1ab-57bb1e3ce018";

            // this *should* break, because... it's an apple
            // ... but it doesn't - it returns an ORANGE
            var orange = await session.LoadAsync<Orange>(appleId);

            await Assert.ThrowsAsync<InvalidCastException>(() => session.LoadAsync<Orange>(appleId));
        }
    }
}
Juliën
  • 9,047
  • 7
  • 49
  • 80
  • to me it throws a Cast Exception, are you sure that it isn't failing about this? – Embri Apr 19 '18 at 12:30
  • @Embri yes, 100% positive. It doesn't fail in the sense of throwing an error - it actually returns the wrong document type (mapped the properties). I've added a sample on ~Somehow - which explains it perhaps even better. Again, please note the ID-generation strategy is different from the default. – Juliën Apr 19 '18 at 12:33
  • Also important to note that Apples and Oranges have similar properties. That's perhaps why I don't receive a Cast exception? @Embri – Juliën Apr 19 '18 at 12:42
  • infact i don't understand why you don't recieve that exception..Apples and Oranges are subclasses of the same base class? – Embri Apr 19 '18 at 12:44
  • No, they derive from an interface (with Id as property) but there's no base class. Just tested, but the InvalidCastException only occurs when you do a Store + Get **in the same session**. If you do the LoadAsync with an existing ID in a new session, my problem should be reproduced. – Juliën Apr 19 '18 at 13:07
  • i tried but i always get the exception. `"@collection": "Contacts",` is it correct? – Embri Apr 19 '18 at 13:12
  • Corrected that metadata, sorry. Also added a fully reproducible sample incl. test fixtures. The second test fails if you provide an existing ID that exist for an Apple (not an Orange) in the database. – Juliën Apr 19 '18 at 13:23

2 Answers2

3

.Query<T>.Where(x => x.Id == id) is the way to go. In RavenDB 4.0, queries by ID are handled directly by the documents storage under the covers (not by an index), so it's the same efficient as Load.

The advantage for your scenario is that queries are scoped to a specified collection only.

  • Thanks. I can't find just yet [in the source](https://github.com/ravendb/ravendb/) where under the hood it is actually the same internals. Perhaps you can point out a bit where this under the hood happens? Also, doesn't Query may create dynamic indexes automatically in this case (which I want to avoid, because the Id's are already optimized for single fetches so that seems an unneccessary step)? – Juliën Apr 19 '18 at 17:37
  • 2
    "Querying by ID is handled differently. Instead of going through an index, if the query optimizer can see that your query is using an ID, it’ll fetch the relevant documents directly using their IDs and pass them to the rest of the query for processing. This applies when you’re querying by ID and don’t have additional filters, sorting or the like that would require the use of an index." (Inside RavenDB 4.0 book, https://github.com/ravendb/book/releases) – arek.palinski Apr 19 '18 at 19:21
  • 2
    In the source such query is called a _collection query_ and it's handled by [CollectionQueryRunner](https://github.com/ravendb/ravendb/blob/v4.0/src/Raven.Server/Documents/Queries/Dynamic/CollectionQueryRunner.cs) – arek.palinski Apr 19 '18 at 19:23
1

well, i found what should be the problem but i don't understand why.

you said:

by setting the Id to string.Empty

but in the example you wrote Id = Guid.NewGuid().ToString(); in my tests i explicitly assign string.Empty and i get the cast exception, when i assigned the generated Guid to the entity (like you) i reproduced your situations. Probably ravendb makes some different considerations in these two cases that creates this behavior, i don't know if it could be considered a bug.

Then use string.Empty

Embri
  • 622
  • 3
  • 15
  • Yes, you are right! My mistake was somewhere in the line I changed to `Guid.NewGuid()`, instead of `string.Empty` (as I mentioned). Changing this back does throw the InvalidCastException, which is definitely fine for me so I don't have to do the quirky workarounds. Thanks! – Juliën Apr 19 '18 at 17:37