Prevent the DbContext from repeatedly trying to save bad data

Question

I have a process that is importing an Excel Spreadhseet, and parsing the data into my data objects. The source of this data is very questionable, as we're moving our customer from a spreadsheet-based data management into a managed database system with checks for valid data.

During my import process, I do some basic sanity checks of the data to accommodate just how bad the data could be that we're importing, but I have my overall validation being done in the DbContext.

Part of what I'm trying to do is that I want to provide the Row # in the spreadsheet that the data is bad so they can easily determine what they need to fix to get the file to import.

Once I have the data from the spreadsheet (model), and the Opportunity they're working with from the database (opp), here's the pseudocode of my process:

foreach (var model in Spreadsheet.Rows) { // Again, pseudocode
    if(opp != null && ValidateModel(model, opp, row)) {
        // Copy properties to the database object

        // This is in a Repository-layer method, not directly in my import process.
        // Just written here for clarity instead of several nested method calls.
        context.SaveChanges(); 
    }
}

I can provide more of the code here if needed, but the problem comes in my DbContext's ValidateEntity() method (override of DbContext).

Again, there's nothing wrong with the code that I've written, as far as I'm aware, but if an Opportunity that failed this level of validation, then it stays as part of the unsaved objects in the context, meaning that it repeatedly tries to get validated every time the ValidateEntity() is called. This leads to a repeat of the same Validation Error message for every row after the initial problem occurs.

Is there a way to [edit]get the Context to stop trying to validate the object after it fails validation once[edit]? I know I could wait until the end and call context.SaveChanges() once at the end to get around this, but I would like to be able to match this with what row it is in the Database.

For reference, I am using Entity Framework 6.1 with a Code First approach.

EDIT Attempting to clarify further for Marc L. (including an update to the code block above)

Right now, my process will iterate through as many rows as there are in the Spreadsheet. The reason why I'm calling my Repository layer with each object to save, instead of working with an approach that only calls context.SaveChanges() once is to allow myself the ability to determine which row is the one that is causing a validation error.

I'm glad that my DbContext's custom ValidateEntity() methods are catching the validation errors, but the problem resides in the fact that it is not throwing the DbEntityValidationException for the same entity multiple times.

I would like it so that if the object fails validation once, the context no longer tries to save the object, regardless of how many times context.SaveChanges() is called.

Dupe of http://stackoverflow.com/questions/5599147/how-do-i-detach-objects-in-entity-framework-code-first? — Jimmy, Jun 02 '14 at 19:50
"is there a way to 'detach' the faulty object from the `context` without removing from the database?..." Maybe I'm misunderstanding, but if the object is valid enough to get into the database, then from the `DbContext` perspective it *is* valid. If it's invalid at a higher, business-object perspective, that's a different level of validation that should be handled elsewhere, and your custom `DbContext` is suffering from a lack of appropriate separation of concerns... please clarify. — Marc L., Jun 02 '14 at 19:53
@Jimmy I don't think that's really what I'm looking to do. Perhaps "detach" was an incorrect term I was trying to use. — krillgar, Jun 02 '14 at 19:54

Marc L. · Accepted Answer · 2014-06-03T14:59:15.780

Your question is not a dupe (this is about saving, not loaded entities) but you could follow Jimmy's advice above. That is, once an entity is added to the context it is tracked in the "added" state and the only way to stop it from re-validating is by detaching it. It's an SO-internal link, but I'll reproduce the code snippet:

dbContext.Entry(entity).State = EntityState.Detached;

However, I don't think that's the way you want to go, because you're using exceptions to manage state unnecessarily (exceptions are notoriously expensive).

Working from the information given, I'd use a more set-based solution:

modify your model class so that it contains a RowID that records the original spreadsheet row (there's probably other good reasons to have this, too)
turn off entity-tracking for the context (turns of change detection allowing each Add() to be O(1))
add all the entities
call context.GetValidationErrors() and get all your errors at once, using the aforementioned RowID to identify the invalid rows.

You didn't indicate whether your process should save the good rows or reject the file as a whole, but this will accommodate either--that is, if you need to save the good rows, detach all the invalid rows using the code above and then SaveChanges().

Finally, if you do want to save the good rows and you're uncomfortable with the set-based method, it would be better to use a new DbContext for every single row, or at least create a new DbContext after each error. The ADO.NET team insists that context-creation is "relatively cheap" (sorry I don't have a cite or stats at hand for this) so this shouldn't damage your throughput too much. Even so, it will at least remain O(n). I wouldn't blame you, managing a large context can open you up to other issues as well.

Thanks for the response. One point I would clarify on what you said above is that `SaveChanges()` will only validate on attached entities that are in `Added`, `Modified`, and `Deleted` states. So after it does save successfully the first time, it gets moved to `Unchanged` and won't be checked again. If every object doesn't pass validation, then yes, I would agree with the O(n^2) scaling you mentioned. — krillgar, Jun 03 '14 at 12:11
I would really like to find out more information on how "expensive" a DbContext is to dispose and create a new one for. I would consider that much more resource intensive than throwing an Exception. — krillgar, Jun 03 '14 at 12:19
Thanks for the help on this. Sorry for being so cryptic with other things that were going on, as I didn't think they were really necessary and I didn't want to confuse the matter any more. I am saving everything in a batch, and did a combination of all of the stuff you suggested in the top portion. As I said above, I think the new Context for every item is far too expensive to implement that way. If I can be proven wrong, great! — krillgar, Jun 03 '14 at 18:25

Prevent the DbContext from repeatedly trying to save bad data

1 Answers1