-1

I am trying to bulk insert to tables "TU" and "TUV" (one to many relationship) with no duplicated records of "TU" table column "HashText" so I did this:

    /// <summary>
    /// Check if the selected TranslationUnit is exsists at db.
    /// </summary>
    /// <param name="translationUnit"></param>
    /// <returns></returns>
    public bool CheckDuplicate(TranslationUnit translationUnit)
    {

        int FoundedTusCount = db.TUs.Where(tu => tu.HashText == translationUnit.Hash && TmIDList.Any(tm => tm == tu.TMID)).Count();
        if(FoundedTusCount > 0)
        {
            return true;
        }
        return false;
    }

this is a lot of connection to DB and a lot of time.

Then insert the remains items to the database using addRange() every 10000 (saving connection time)record like this:

            foreach (TranslationUnit item in Options.TMXFile.Data.Body.TranslationUnits.ToList())
            {
                TU tu = FillTU(TMID, item);
                ReadyTUs.Add(tu);
                tu.TUVs = new List<TUV>();
                foreach (TranslationUnitVariant item1 in item.TranslationUnitVariants)
                {
                    tu.TUVs.Add( FillTUV(item1));
                }
                if (ReadyTUs.Count >= 10000)
                {
                    Save();
                }
            }
            Save();

this is the save function:

    private void Save()
    {
        db.TUs.AddRange(ReadyTUs);
        db.SaveChanges();
        ReadyTUs.Clear();
        db.Dispose();
        db = new GTMEntities();
    }

My main problem now is time and many connections to DB So, if there is some way to bulk insert TU and TUV table with no duplicated TUs -that contains millions of records- it will be much better.

  • One idea is to bulk insert into a staging table TU_RAW, for example, everything. Then do the actual insert into TU using a stored procedure. That usually decreases time and lowers db connections. – Jesse Jul 06 '17 at 14:26
  • If you can use SQL SERVER directly... it would be faster than Entity framework – Antoine Pelletier Jul 06 '17 at 14:32

1 Answers1

0

One option would be to use SqlBulkCopy.

Basically you bulk insert all data into a staging table.

Then join the main and the staging tables and DELETE all records in the staging table that match.

Then INSERT into the real table with the results of a SELECT DISTINCT from the staging table.

mjwills
  • 23,389
  • 6
  • 40
  • 63