0

I have a console application that reads emails and their attachments. I save the attachments to disc successfully and now I need to update the stock levels of products in my products table from the csv attachment.

I am looking for the best method of doing this as the CSV file has about 8000 and for this supplier we will have only 1000 products to update. So should i add all to a table delete unwanted or from console application read each line if product in csv in my db table insert into temp table or just do the update straight away?

I dont want to cause bottle necks, timouts etc as these files of 3mb will be collected upto every 10 minutes or so.

to read the csv i am looking at these methods:

Thanks

Community
  • 1
  • 1
Jonnymaboy
  • 543
  • 2
  • 4
  • 18

1 Answers1

0

I've written a number of similar applications.

In my experience, it's best to treat the CSV files as untrusted - they may be corrupted, the exporting application may have bugs, the data may not be consistent. Many of the validation steps in my systems have required access to data that lives in the SQL database.

So, the steps I usually take:

  • (.Net app): check CSV file for consistency (e.g. ask for a checksum from the exporting application, or a row count, or a file size).
  • (.Net app): read in CSV file
  • (.Net app): for each line in CSV file:
    • validate line (column types and nullability)
    • import line into database holding table.
  • Once all records are imported:
  • (SQL): run data validation against holding table (foreign key constraints, business/data rules)
  • SQL: move data from holding table to production table
  • SQL: update other production data as required (e.g. update stock levels).

By using a transaction around the SQL queries, you can roll back the whole import routing if you encounter an error.

This approach allows me to validate the data from an untrusted source against my reference data in SQL, and by doing it against all imported data rather than row by row, it's fast and efficient. Transaction management is relatively straightforward with a small number of SQL statements that affect lots of rows (as opposed to lots of SQL statements affecting one row). Again, this tends to be more efficient.

Neville Kuyt
  • 29,247
  • 1
  • 37
  • 52
  • thankyou very much for your help. You have confirmed my thoughts – Jonnymaboy Jan 08 '14 at 23:07
  • hi, Just one last thing. I agree write each line from CSV to DB but why do this and not after saving the csv to a specific folder then call a dtsx that imports the file for me.?? Thanks – Jonnymaboy Jan 10 '14 at 14:08