19

I use entity framework migration (in Automatic migration mode). Everything is okay, but I have one question:

How should I seed data when I have many-to-many relationships?

For example, I have two model classes:

public class Parcel
{
    public int Id { get; set; }
    public string Description { get; set; }
    public double Weight { get; set; }
    public virtual ICollection<BuyingItem> Items { get; set; }
}

public class BuyingItem
{
    public int Id { get; set; }
    public decimal Price { get; set; }
    public virtual ICollection<Parcel> Parcels { get; set; }
}

I understand how to seed simple data (for PaymentSystem class) and one-to-many relationships, but what code should I write in the Seed method to generate some instances of Parcel and BuyingItem? I mean using DbContext.AddOrUpdate(), because I don't want to duplicate data every time I run Update-Database.

protected override void Seed(ParcelDbContext context)
{
    context.AddOrUpdate(ps => ps.Id,
        new PaymentSystem { Id = 1, Name = "Visa" },
        new PaymentSystem { Id = 2, Name = "PayPal" },
        new PaymentSystem { Id = 3, Name = "Cash" });
}

protected override void Seed(Context context)
{
    base.Seed(context);

    // This will create Parcel, BuyingItems and relations only once
    context.AddOrUpdate(new Parcel() 
    { 
        Id = 1, 
        Description = "Test", 
        Items = new List<BuyingItem>
        {
            new BuyingItem() { Id = 1, Price = 10M },
            new BuyingItem() { Id = 2, Price = 20M }
        }
    });

    context.SaveChanges();
}

This code creates Parcel, BuyingItems and their relationship, but if I need the same BuyingItem in another Parcel (they have a many-to-many relationship) and I repeat this code for the second parcel - it will duplicate BuyingItems in the database (though I set the same Ids).

Example:

protected override void Seed(Context context)
{
    base.Seed(context);

    context.AddOrUpdate(new Parcel() 
    { 
        Id = 1, 
        Description = "Test", 
        Items = new List<BuyingItem>
        {
            new BuyingItem() { Id = 1, Price = 10M },
            new BuyingItem() { Id = 2, Price = 20M }
        }
    });

    context.AddOrUpdate(new Parcel() 
    { 
        Id = 2, 
        Description = "Test2", 
        Items = new List<BuyingItem>
        {
            new BuyingItem() { Id = 1, Price = 10M },
            new BuyingItem() { Id = 2, Price = 20M }
        }
    });

    context.SaveChanges();
}

How can I add the same BuyingItem in different Parcels?

Lauren Rutledge
  • 1,195
  • 5
  • 18
  • 27
Dmitry Gorshkov
  • 3,523
  • 4
  • 18
  • 20

3 Answers3

20

Updated Answer

Make sure you read "Using AddOrUpdate Properly" section below for a complete answer.

First of all, let's create a composite primary key (consisting of parcel id and item id) to eliminate duplicates. Add the following method in the DbContext class:

protected override void OnModelCreating(DbModelBuilder modelBuilder)
{
    base.OnModelCreating(modelBuilder);

    modelBuilder.Entity<Parcel>()
        .HasMany(p => p.Items)
        .WithMany(r => r.Parcels)
        .Map(m =>
        {
            m.ToTable("ParcelItems");
            m.MapLeftKey("ParcelId");
            m.MapRightKey("BuyingItemId");
        });
}

Then implement the Seed method like so:

protected override void Seed(Context context)
{
    context.Parcels.AddOrUpdate(p => p.Id,
        new Parcel { Id = 1, Description = "Parcel 1", Weight = 1.0 },
        new Parcel { Id = 2, Description = "Parcel 2", Weight = 2.0 },
        new Parcel { Id = 3, Description = "Parcel 3", Weight = 3.0 });

    context.BuyingItems.AddOrUpdate(b => b.Id,
        new BuyingItem { Id = 1, Price = 10m },
        new BuyingItem { Id = 2, Price = 20m });

    // Make sure that the above entities are created in the database
    context.SaveChanges();

    var p1 = context.Parcels.Find(1);
    // Uncomment the following line if you are not using lazy loading.
    //context.Entry(p1).Collection(p => p.Items).Load();

    var p2 = context.Parcels.Find(2);
    // Uncomment the following line if you are not using lazy loading.
    //context.Entry(p2).Collection(p => p.Items).Load();

    var i1 = context.BuyingItems.Find(1);
    var i2 = context.BuyingItems.Find(2);

    p1.Items.Add(i1);
    p1.Items.Add(i2);

    // Uncomment to test whether this fails or not, it will work, and guess what, no duplicates!!!
    //p1.Items.Add(i1);
    //p1.Items.Add(i1);
    //p1.Items.Add(i1);
    //p1.Items.Add(i1);
    //p1.Items.Add(i1);

    p2.Items.Add(i1);
    p2.Items.Add(i2);

    // The following WON'T work, since we're assigning a new collection, it'll try to insert duplicate values only to fail.
    //p1.Items = new[] { i1, i2 };
    //p2.Items = new[] { i2 };
}

Here we make sure that the entities are created/updated in the database by calling context.SaveChanges() within the Seed method. After that, we retrieve the required parcel and buying item objects using context. Thereafter we use the Items property (which is a collection) on the Parcel objects to add BuyingItem as we please.

Please note, no matter how many times we call the Add method using the same item object, we don't end up with primary key violation. That is because EF internally uses HashSet<T> to manage the Parcel.Items collection. A HashSet<Item>, by its nature, won't let you add duplicate items.

Moreover, if you somehow manage to circumvent this EF behavior as I have demonstrated in the example, our primary key won't let the duplicates in.

Using AddOrUpdate Properly

When you use a typical Id field (int, identity) as an identifier expression with AddOrUpdate method, you should exercise caution.

In this instance, if you manually delete one of the rows from the Parcel table, you'll end up creating duplicates every time you run the Seed method (even with the updated Seed method I have provided above).

Consider the following code,

context.Parcels.AddOrUpdate(p => p.Id,
    new Parcel { Id = 1, Description = "Parcel 1", Weight = 1.0 },
    new Parcel { Id = 2, Description = "Parcel 1", Weight = 1.0 },
    new Parcel { Id = 3, Description = "Parcel 1", Weight = 1.0 }
);

Technically (considering the surrogate Id here), the rows are unique, but from the end-user point of view, they are duplicates.

The true solution here is to use the Description field as an identifier expression. Add this attribute to the Description property of the Parcel class to make it unique: [MaxLength(255), Index(IsUnique=true)]. Update the following snippets in the Seed method:

context.Parcels.AddOrUpdate(p => p.Description,
    new Parcel { Description = "Parcel 1", Weight = 1.0 },
    new Parcel { Description = "Parcel 2", Weight = 2.0 },
    new Parcel { Description = "Parcel 3", Weight = 3.0 });

// Make sure that the above entities are created in the database
context.SaveChanges();

var p1 = context.Parcels.Single(p => p.Description == "Parcel 1");

Note, I'm not using the Id field as EF is going to ignore it while inserting rows. And we are using Description to retrieve the correct parcel object, no matter what Id value is.


Old Answer

I would like to add a few observations here:

  1. Using Id is probably not going to do any good if the Id column is a database generated field. EF is going to ignore it.

  2. This method seems to be working fine when the Seed method is run once. It won't create any duplicates, however, if you run it for a second time (and most of us have to do that often), it may inject duplicates. In my case it did.

This tutorial by Tom Dykstra showed me the right way of doing it. It works because we don't take anything for granted. We don't specify IDs. Instead, we query the context by known unique keys and add related entities (which again are acquired by querying context) to them. It worked like a charm in my case.

Lauren Rutledge
  • 1,195
  • 5
  • 18
  • 27
Ravi M Patel
  • 2,905
  • 2
  • 23
  • 32
  • 1
    I used the method in the tutorial mentioned above and it worked. – tekiegirl Oct 29 '14 at 13:12
  • The marked answer does not account for identity field primary keys. The Seed method from the Tom Dykstra tutorial does, so it is a much more robust solution. – Brad Mathews Mar 10 '16 at 23:04
  • Please consider adding some more detail to your answer other than just a link. – mikesigs Jul 11 '16 at 16:11
  • @mikesigs Thanks, I have updated my answer. Is it good enough now? – Ravi M Patel Jul 11 '16 at 19:46
  • It's definitely better. You shouldn't specify IDs in the initializers though, and definitely shouldn't use it as the identifierExpression in the AddOrUpdate call. For Parcel you're better off using the Description (but will want to enforce uniqueness on that column too then) and for BuyingItems, well... there's not a good option there. But that's a domain issue, not an EF issue. – mikesigs Jul 11 '16 at 20:04
  • @mikesigs, you're right. The reason why I kept the IDs intact was to show how one can do it without changing the given model in this case. I could use `Description` field in case of `Parcel` but I had no choice with `BuyingItem`. Now I've added a section demonstrating proper use of the AddOrUpdate method with a modified `Parcel` class having a unique key index on `Description` field so that one case use it to retrieve unique `Parcel` objects. Similarly, one can fix the design flaw of `BuyingItem` class too. – Ravi M Patel Jul 11 '16 at 20:27
  • 1
    Perfect. So many people don't understand the pitfalls of using Id as the identifierExpression. Now hopefully someone will come and mark this as the answer! – mikesigs Jul 11 '16 at 20:30
19

You must fill many-to-many relation in the same way as you build many-to-many relation in any EF code:

protected override void Seed(Context context)
{
    base.Seed(context);

    // This will create Parcel, BuyingItems and relations only once
    context.AddOrUpdate(new Parcel() 
    { 
        Id = 1, 
        Description = "Test", 
        Items = new List<BuyingItem>
        {
            new BuyingItem() { Id = 1, Price = 10M },
            new BuyingItem() { Id = 2, Price = 20M }
        }
    });

    context.SaveChanges();
}

Specifying Id which will be used in database is crucial otherwise each Update-Database will create new records.

AddOrUpdate doesn't support changing relations in any way so you cannot use it to add or remove relations in next migration. If you need it you must manually remove relation by loading Parcel with BuyingItems and calling Remove or Add on navigation collection to break or add new relation.

Ladislav Mrnka
  • 360,892
  • 59
  • 660
  • 670
  • Is it still valid that `AddOrUpdate` doesn't support changing relations in EF Migrations final release? It would explain the problem in this question http://stackoverflow.com/q/10474839/270591, wouldn't it? – Slauma May 07 '12 at 18:27
  • 7
    This doesn't seem to work as expected if BuyingItem.Id is an identity column. In that case, when the item does not exist, it is inserted with an ID generated by the DB instead of the ID you specified. This means AddOrUpdate with hard coded IDs are not a viable solution for seeding, because it will generated duplicates on the next seed run. Tested on EF5. – angularsen Jan 06 '13 at 15:36
  • Ladislav, I'm not seeing this work with the `Parcel.Items.Add()` method on the related entity relationship as you suggested when the root entity and the related entity exist already in the DB (in short, when we're trying to simply update the relationship solely). When calling `Parcel.Items.Add(new BuyingItem() { Id = 1, Price = 10M })`, the item is not added as would be expected after calling `context.SaveChanges()`. – Ryan Griffith Jun 28 '14 at 00:59
  • 1
    As I suppose the Id is identity column and will be generated in dbms. In this case you can't set it to value you want. So there is no way to be sure that there will be no duplicates when EF will run seeding again, because you don't have control over Ids. For example you have 3 parcels (parcel1, parcel2, parcel3) and you removed parcel2 from seeding. You will have duplicated parcel3 if you will decide to create your db again or run your app on other instance. The problem arise when Ids used in seeding code become out of sync with Ids in db. – Vladislav Kostenko Aug 13 '14 at 16:40
  • 4
    -1 as this is wrong and wrongly accepted answer: manually set ids are ignored, so this will not work and solve the problem. The OP has an answer with the right solution below. – Michael Sagalovich Aug 21 '15 at 09:30
3

Ok. I understand how I should be in that situation:

protected override void Seed(Context context)
{
    base.Seed(context);
    var buyingItems = new[]
    {
        new BuyingItem
        {
             Id = 1,
             Price = 10m
        },
        new BuyingItem
        {
             Id = 2,
             Price = 20m,
        }
    }

    context.AddOrUpdate(new Parcel() 
    { 
        Id = 1, 
        Description = "Test", 
        Items = new List<BuyingItem>
        {
            buyingItems[0],
            buyingItems[1]
        }
    },
    new Parcel() 
    { 
        Id = 2, 
        Description = "Test2", 
        Items = new List<BuyingItem>
        {
            buyingItems[0],
            buyingItems[1]
        }
    });

    context.SaveChanges();
}

There are no duplicates in database.

Thank you, Ladislav, you gave me a right vector to find a solution for my task.

Dmitry Gorshkov
  • 3,523
  • 4
  • 18
  • 20