0

I'm writing an application in c# that automates a comparison of values in two different databases.

The data is provided in two .xlsx files. These are loaded by the end user into the application. An in memory DataTable is then created from both files. One for each.

Now I need to compare column 1 from DataTable one with column 3 from DataTable two. The order of contents is not the same but they are all, always numeric values. Theres a couple of possible outcomes:

  • All cells match (but not in order). In other words, all cells that are in Column 1 from DataTable one are also in column 3 from DataTable two. There are no other cells present.

  • There is a cell present in column 1 from DataTable one which is not present in column 3 from DataTable two. In this case I want to write these cells to a seperate worksheet. I'm using closedXML and a workbook is already created and present.

  • There is a cell present in column 3 from DataTable two which is not present in column 1 from DataTable one. In this case I want to write these cells to a seperate worksheet. I'm using closedXML and a workbook is already created and present.

I have thought about creating a function that accepts two parameters. Both DataTables. A comparison would then be done on the first column of table one and the third column of table 3.

Another solution which I am working on now pasting both DataTables to a worksheet via ClosedXML. Extracting the relevant column (1 and 3) to another seperate worksheet and using a for each loop to compare all cells from both worksheets and get the differences. But I dont know how to go on about this as far as get the content which is not in column 1 but its in column 3 to worksheet A and the content which is in column 1 but not in column 3 to worksheet B.

JB1989
  • 35
  • 6

1 Answers1

1

You can use Linq to find what is in the column 1 and not in column 3 and vice-versa.

You just have to add System.Data.DataSetExtensions in reference of your project like here

And then the Linq query :

var dataTable1 = new DataTable();
dataTable1.Columns.Add(new DataColumn("Data Column 1"));

dataTable1.Rows.Add(1, "", "");
dataTable1.Rows.Add(3, "", "");
dataTable1.Rows.Add(5, "", "");
dataTable1.Rows.Add(7, "", "");
dataTable1.Rows.Add(9, "", "");

var dataTable2 = new DataTable();
dataTable2.Columns.Add(new DataColumn("Data Column 1"));
dataTable2.Columns.Add(new DataColumn("Data Column 2"));
dataTable2.Columns.Add(new DataColumn("Data Column 3"));

dataTable2.Rows.Add("", "", 1);
dataTable2.Rows.Add("", "", 2);
dataTable2.Rows.Add("", "", 4);
dataTable2.Rows.Add("", "", 6);
dataTable2.Rows.Add("", "", 7);
dataTable2.Rows.Add("", "", 8);
dataTable2.Rows.Add("", "", 9);

// Use the id of the column
var column1 = dataTable1.AsEnumerable().Select(r => r.ItemArray[0]).ToList();
var column3 = dataTable2.AsEnumerable().Select(r => r.ItemArray[2]).ToList();

// Use the name of the column
var column1WithName = dataTable1.AsEnumerable().Select(r => r.Field<string>("Data Column 1")).ToList();
var column3WithName = dataTable2.AsEnumerable().Select(r => r.Field<string>("Data Column 3")).ToList();

// All element of the Column 1 (DataTable1) that are not in Column 3 (DataTable2)
var column1Without3 = column1.Except(column3).ToList(); // [ "3", "5" ]
// All element of the Column 3 (DataTable2) that are not in Column 1 (DataTable1)
var column3Without1 = column3.Except(column1).ToList(); // [ "2", "4", "6", "8" ]
Guilhem Prev
  • 979
  • 5
  • 17
  • Thanks.It is now working as far as the comparison goes. But its always taking the last column of the DataTable. The first DataTable has a total 4 column and the second DataTable has a total of 6 columns. The first column from the first DataTable and the third column from the second DataTable are the ones that contain comparable values. I tried changing your code to direct index references but it still takes the last column of the DataTables for the actual comparison (so it takes the fourth column from the first DataTable and the sixth column from the second DataTable) – JB1989 Jul 26 '19 at 12:35
  • @JB1989 Did you change the ItemArray[] for the 2 variable column1 and column3 ? You can also try to use the name of the column. I add 2 column in datatable1 and the code still works – Guilhem Prev Jul 26 '19 at 12:44
  • I did. It does not matter if I use the name of the column via a string or the index position. It always takes the last column from both DataTables. – JB1989 Jul 26 '19 at 12:59
  • @JB1989 Can you provide more detail about your datatables ? I can't help you if it works with my 2 tables and not yours – Guilhem Prev Jul 26 '19 at 13:00
  • I understand. Thank you for your help so far. I actually figured it out myself using a traditional for each loop after extracting the relevant columns directly from the DataTable and then using the LINQ except method you provided. For some reason the lambda expression was not working correctly. Try a DataTable with 5 and a DataTable with 7 columns and then try to retrieve columns 1 and 3 using the lambda and the Linq except method. It will give other input then intended. – JB1989 Jul 30 '19 at 06:04
  • @JB1989 The number of columns in the datatable didn't change anything, you probably have a different datatable than the basic example that I give in the answer. – Guilhem Prev Jul 30 '19 at 07:32