0

I have two text files containing strings on each line. I compare these text files to find which strings have been added and which have been deleted.

Each day roughly 100,000 strings are added and 100,000 strings are deleted.

I currently use System.Data.SqlClient.SqlBulkCopy to add new rows. This adds the 100,000 new rows in roughly 1 second.

However, I cannot see a similar way to delete rows. I have 100,000 strings to remove from a table of 14 million rows. Even attempting to remove 5000 in one go using IN (x,x,x,x) results in a timeout after a minute.

Is there an efficient method in c# to remove these strings?

Ross
  • 33
  • 3
  • 1
    Without any code, I assume it's an issue with your SQL DB and Table definition. – Markus Deibel Nov 14 '19 at 12:49
  • Insert in a temp table. Use a direct Sql statement . – xdtTransform Nov 14 '19 at 12:49
  • 2
    It's not going to be a way in c# but a way in sql. You need a way to find all 100k of them (an index will be useless when checked against 100k values directly, but if you can index those values, then maybe), and you'll want to drop as many indexes as you can before deleting, and recreate them after, as indexes will slow deletions down massively. – 404 Nov 14 '19 at 12:51
  • similar question [here] (https://stackoverflow.com/questions/54403944/c-sharp-efficiently-delete-50000-records-in-batches-using-sqlbulkcopy-or-equival) – Sushant Yelpale Nov 14 '19 at 12:52
  • Which dbms are you using? – jarlh Nov 14 '19 at 13:04

2 Answers2

2

You could insert the string that you want to remove in another table.

Then use a delete query which joins to that table to remove them all at once.

delete t
from your_table t
join helper_table h on t.name = h.name
juergen d
  • 201,996
  • 37
  • 293
  • 362
0

Yes same experience for me too in the past. I offer to delete partially, in otherwise you will have big terrible every time.

You can execute partially SQL example.

--Example Read first 2.000 (your choice) record and execute delete

using (StreamReader sr = File.OpenText(filepath))
{
     List<string>lines;
     while ((string line = sr.ReadLine()) != null)
     {    
          lines.Add(line);      
          lineCount++;

if(lineCount%2000==0)
{ 
Execute Delete Here
lines.Clear(); //dont forget.
}

     }
Execute rest of lines delete//

}

for second option I can offer BATCH delete on SQL:

Deleting 1 millions rows in SQL Server

wikiCan
  • 449
  • 3
  • 14