Using Serialization to copy entities between two ObjectContexts in Entity Framework

Question

Preface

I know this is probably already regarded as crazy question, but I am looking for the most educated advice and PROVEN recommendation on how to proceed with copying ALL data (i.e., all entities and relationships) from an ObjectContext to a newly created ObjectContext backed in a different store. See "Cloning" EntityConnections and ObjectContexts in Entity Framework to check out how I'm setting this up.

Intro

I've seen Cloning data on Entity Framework, but:

I'm looking for the whole enchilada, i.e., the whole object graph: all entities & relationships. Later on, I'll go for a more fined-grained selection of what parts of the object graph)
As per my updates below, I haven't made serialization to work, but only on special cases. I feels like it really be a not-so-complex task, but it's been surprisingly challenging. I definitely need some insight on how to make it work.

Step 1

OK, this is what I've tried so far.

I am aware that when I use the Detach/Attach power duo, relationships are kaput—and I ACTUALLY want to preserve the whole object graph.

Therefore, I was thinking of loading the root entities with the MergeOption.NoTracking option:

var serializer = new DataContractSerializer(typeof(Root));

var query = (ObjectQuery<Root>) sourceContext.Roots
    .Include(d => d.Children.Select(c => c.MoreChildren.Select(r => r.EvenMoreChildren)))
    .Include(d => d.Children.Select(c => c.MoreChildren.Select(r => r.MoreAndMoreChildren)))
    .Include(d => d.Children.Select(c => c.MoreChildren.Select(r => r.Shizzle)));

foreach (var d in query.Execute(MergeOption.NoTracking)) {
    //sourceContext.Detach(d); // not needed
    Print(d);
    using (var ios = new MemoryStream()) {
        serializer.WriteObject(ios, d);
        ios.Seek(0, SeekOrigin.Begin);
        var dd = (Root) serializer.ReadObject(ios);
        //Console.WriteLine(string.Join(",", dd.EntityKey.EntityKeyValues.Select(k => k.Key + "=" + k.Value)));
        targetContext.Roots.AddObject(dd);
    }
}

Given that I am loading entities as non-tracked, the I don't need to call sourceContext.Detach(d) anymore.

The Print method simply prints the child-object tree, and it shows that up to that point things are going well (I won't show it here cuz it's huge and irrelevant).

However, now the whole thing is blowing @ serializer.WriteObject(ios, d) with the following message:

"When an object is returned with a NoTracking merge option, Load can only be called when the EntityCollection or EntityReference does not contain objects."

(Which kinda makes sense, because the serializer is probably trying to lazy load related entities.)

Remember, if I don't use NoTracking, I have to detach the entities but then I would lose my relationships...

Step 2

Of course I've tried setting sourceContext.ContextOptions.LazyLoadingEnabled = false just before executing the serializing loop, and that fixes the error above, but that results in the infamous:

"The object could not be added or attached because its EntityReference has an EntityKey property value that does not match the EntityKey for this object."

Also, remember I still cannot uncomment the sourceContext.Detach(d) since I loaded the roots with NoTracking...

Step 3

I've tried setting the EntityKey = null before serialization and even after deserialization on the cloned entity... all to no avail:

sourceContext.ContextOptions.LazyLoadingEnabled = false;

foreach (var d in query.Execute(MergeOption.NoTracking)) {
    //sourceContext.Detach(d);
    Print(d);
    d.EntityKey = null;
    using (var ios = new MemoryStream()) {
        serializer.WriteObject(ios, d);
        ios.Seek(0, SeekOrigin.Begin);
        var dd = (Root) serializer.ReadObject(ios);
        if (dd.EntityKey != null) {
            Console.WriteLine("Deserialized EntityKey: {0}", string.Join(",", dd.EntityKey.EntityKeyValues.Select(k => k.Key + "=" + k.Value)));
            dd.EntityKey = null;
        }
        targetContext.Roots.AddObject(dd);
    }
}

What's fishy is that I don't even know what bloody "object" is the exception above talking about.

Am I really, really, really, really DOOMED in my attempt to have a purely EF approach to solve this problem???

Am I completely DOOMED?!?!?! :(

This should be done on the SQL side, probably using a stored procedure — Aducci, Sep 26 '12 at 14:59
Well, that's precisely what I am NOT looking to do. Let me explain: The EF is this marvelous achievement because, among may things, it does all the reference tracking of an object graph for u, and takes care of the concepts of entity keys and identities marvelously. My relationships are fairly complex already with just few tables, I use a TPT strategy, identity columns for primary keys, polymorphic 1—* and 1—0..1 associations and not just for the fun of it. My entities change. Often. Doing this task generically on the SQL side is a known problem that nobody has actually solved. — Fernando Espinosa, Sep 26 '12 at 15:07
Regardless of the complexity, using SQL it is very easy to copy data from one source to another. — Aducci, Sep 26 '12 at 15:15
"Doing this task generically on the SQL side is a known problem that nobody has actually solved" - Have you done any research to verify this? There's a slew of ETL tools that do exactly that, and can be adapted _quickly_ to account for structure changes without a lot of effort.. — D Stanley, Sep 26 '12 at 15:17
Sorry, I should have been less dramatic: When I said "doing this task generically on the SQL side is a known problem that nobody has actually solved" I meant more like "integrated generically with client code"... I shouldn't have written "not solved" (a backup essentially does that) but sometimes I will only want to extract and "merge" only sub-graphs of the object graph (I mentioned that in the post) — Fernando Espinosa, Sep 26 '12 at 15:27
I just want to pin-point and come fully honest: I really only want to use the DB as a repository, as mere store... Along the lines of CodeFirst (actually one of the reasons I want to do this, is because I'm migrating to CodeFirst, but will be working with two APIs for now...) Let me quote http://stackoverflow.com/a/5446587/1426433: "General expectation is that you do not bother with DB. DB is just a storage with no logic. EF will handle creation and you don't want to know how it do the job." — Fernando Espinosa, Sep 26 '12 at 15:33
If it is that hard to make it work it probably won't work correctly... — Pawel, Sep 26 '12 at 18:21

score 5 · Answer 1 · answered Sep 26 '12 at 18:10

5

@khovanskiiªn - here's the thing. Despite what you say about the wonderfulness of EF, it's not designed to do this. You can wax poetic all day about the wonders of a hammer, but it won't help you if your problem requires a wrench.

There are lots of solutions out there that will do what you want, but don't use EF and do it on the SQL side, it even has a fancy name. It's called ETL or Extract/Transform/Load, it's sole purpose is to take data from one set of tables and move it to another, possibly massaging the data in between.

You can, in fact, integrate these tools with client code seamlessly. For example, an SSIS package can be executed from client code, and parameters passed to it to control which data is worked on.

The fact of the matter is, Entity Framework is designed to work with a single context, and it only tracks relationships and changes on a single context. As soon as you separate it, it loses that information.

What's more, Entity framework is horrendously inefficient for this kind of thing because EF works on single entities, rather than bulk/batch operations. If you have 1 million records, it could take all day to do a single operation, whereas a sql-side batch operation could take a few minutes.

Do yourself a favor, look into SSIS (Sql Server Integration Services).

answered Sep 26 '12 at 18:10

Erik Funkenbusch

92,674
28
195
291

I appreciate the length and civility, and I concur with your last statement about EF's horrendous efficiency. EF has many design issues, and so I am learning that this is one more. My fact was that up until now I basically did not have to touch the DB server at all (well, I'm lying, I started on the DB side, but then I started using EF and totally shifted towards code) and hoped that I would not come back, not because I don't like SQL, but because I hate having to maintain two worlds...) All that said, I still long for being able to solve this via EF. I will research ETL/SSIS... – Fernando Espinosa Sep 26 '12 at 19:08
@khovanskiiªn - Everything has tradeoffs. Just because EF isn't good at bulk/batch jobs doesn't mean its a "design issue". SQL isn't good at object oriented modeling, that doesn't make it a flaw in the design of SQL. It's just that you make tradeoffs in any technology, and you use the tool that fits the job. The flaw here is choosing the wrong tool for the job. – Erik Funkenbusch Sep 26 '12 at 19:13
Well, I accept that as well... I mean, in my experience I've also been disillusioned with [EF's data-binding](http://stackoverflow.com/q/11061548/1426433). Just let me emphasize, I'm not planning to work with several context within the same application. All I'm asking for is as simple as wanting to do something like a `"WriteAllEntities()"` after I change the entity connection, that doesn't sound like a maniacal request to EF, does it? Probably I should just tried that directly, step 1 by 1 thru all entities, mark them "dirty", change the connection, and then do `SaveChanges()`. – Fernando Espinosa Sep 26 '12 at 19:23