0

I have a requirement where I need to check for data duplication before the data is sent to database.

Basically, we have a EXCEL file which contains the required data in columns and sheets. We need to upload the data from EXCEL to our SQL Server 2008 DB.

Before uploading the data, we load the EXCEL data using "Microsoft.Ace.OLEDB.12.0" and display to the users using WPF Grid - This works fine.

The next step is to check the data loaded from EXCEL file against the data available in SQL Server DB. We have a primary column called invoice number. This invoice number is spread across all the excel file as the shipment of requested goods is done partially. Till the entire items for this particular invoice is uploaded to the DB we need to make sure the data which is being uploaded is not already present in SQL Server DB.

The excel file has multiple columns but we need to check based on couple of columns for duplicate entry.

We are storing the loaded EXCEL data in .net datatable and planning to load the data from SQL Server to Dataset (Not sure if this is a good approach)

What is the suggested good practice to achieve this. I have seen this link Duplicacy check between DATABASE and DATATABLE but not sure if this would solve my requirement.

Thanks in advance.

Community
  • 1
  • 1
aioracle
  • 371
  • 2
  • 7
  • 20
  • used a stored procedure then you can specify if not exist (select * from table where column =@parameter) then do whatever – zxc Oct 22 '13 at 05:34
  • I think the link you provided is your solution. What makes u unsure? With the inner join (or as an alternative LinQs Intersect) you can find duplicate rows and specify the columns you want to compare. – Dannydust Oct 22 '13 at 05:41
  • @Dannydust, the unsure is because of the Dataset and Datatable approach I am using. I do know the LINQ to DB should work seamlessly but still have that doubt because I am checking against the live data which may be refreshed on a timely basis. – aioracle Oct 22 '13 at 07:55

2 Answers2

0

I would try either:

SELECT 'Exists' FROM table where id = @itemId --returns string 'Exists' if there is an item

Or

SELECT ISNULL(id, '0') FROM table where id = @itemId -- returns '0' if there is an item

Or

SELECT ISNULL(id, 0) FROM table where id = @itemId -- returns 0 if there is an item 

If you need to check it on multiple tables just join them and do the same.

Another aproach is the one you have provided with the link, you could SELECT the value you want to compare, fill DataTable with it then loop over it, like this:

DataTable dt = new DataTable();
SqlDataAdapter adapt = new SqlDataAdapter("SELECT id FROM table", connection);
adapt.Fill(dt);

for (int i = 0; i < excelDataTable.Rows.Count; i++)
{
    for (int j = 0; j < dt.Rows.Count; j++)
    {
        if (excelDataTable.Rows[i]["col_name"] == dt.Rows[j]["col_name"]) ;
    }
}

And finally you can check this:

Another question with answer

Community
  • 1
  • 1
Tafari
  • 2,639
  • 4
  • 20
  • 28
0

I do not know the background but if you are using SQL Server there is a great utility called SQL Server Integration Services (SSIS). It makes data migration (called ETL: extract-transform-load) really easy. I know that you would have to learn a new utility but if you are using SQL Server then it may be really helpful in future!

Using SSIS you can make a workflow with Excel file as a source and load the data to SQL Server as a destination. You would end up with something like this:

enter image description here

Pellared
  • 1,242
  • 2
  • 14
  • 29
  • the solution seems good, but what are the security implications to access and run the SSIS Project. I haven't used this tool any time but can surely try it out. But, I have heard you need administrator permission to run this. So if the end user uploads the EXCEL file what are the settings I should be doing at SQL Server side so the SSIS package runs without any hurdle. – aioracle Oct 22 '13 at 07:52