2

I have around 6000 rows and 600 columns in my Excel worksheet. There is an ID column where I am highlighting differentiated values in all cells if the IDs are equal. But my code is taking too long to loop through each row and highlight. How my code could be optimized?

 private void worker_DoWork(object sender, System.ComponentModel.DoWorkEventArgs e)
{
    for (r = 2; r <= totalRow; r++)
    {
        Duplicate();
        int percentage = (r + 1) * 100 / totalRow;
        worker.ReportProgress(percentage);
    }             
}

private void Duplicate()
{
    if (xlWorksheet.Cells[r, ssid].Value == xlWorksheet.Cells[r + 1, ssid].Value)
    {
        for (int c = 1; c <= columnCount; c++)
        {
            if (Convert.ToString(xlWorksheet.Cells[r, c].Value) != Convert.ToString(xlWorksheet.Cells[r + 1, c].Value))
            {
                Excel.Range cellRange = (Excel.Range)xlWorksheet.Cells[r + 1, c];
                cellRange.Interior.Color = Excel.XlRgbColor.rgbRed;
            }
        }
    }
}
Firdavs Kurbonov
  • 1,252
  • 4
  • 16
  • 42
  • 1
    Application.ScreenUpdating = false; before the work then back to true after can have a big effect. For best results use range objects and convert them to arrays for reading/comparison. E.g. https://stackoverflow.com/questions/22760591/convert-excel-range-to-array-using-c-sharp? – Alex K. Apr 04 '18 at 12:46
  • I added Application.ScreenUpdating = false; before work and changed to true after. It seems no impact on performance. – Firdavs Kurbonov Apr 05 '18 at 05:40

2 Answers2

3

Reading and writing to excel is what slows the program quite a bit. Try to avoid the reading and the unnecessary writing:

  • Read the Excel file once and write its values in a 2-dimensional matrix, with dimensions row and column;
  • Loop through the matrix and check the values. For 3.6M it will be fast;
  • Record the results in another 2-dimensional boolean matrix.
  • Loop through the booelan matrix and write only on the cells which you need to write;
  • You can even do something better - loop through the boolean matrix and assign the corresponding cells to a range with Union(). At the end change the color of the range only once.

Following the above points (without the last one), this is a basic start:

enter image description here

And this is a basic end, considering that our task is to color equal cells for every row (this task can be easily changed):

enter image description here

This is the code:

using System;
using Excel = Microsoft.Office.Interop.Excel;

class StartUp
{
    static void Main()
    {
        string filePath = @"C:\Sample.xlsx";

        int rowsCount = 5;
        int colsCount = 6;

        Excel.Application excel = new Excel.Application();
        excel.Visible = false; 
        excel.EnableAnimations = false;

        Excel.Workbook wkb = Open(excel, filePath);
        Excel.Worksheet wk = (Excel.Worksheet)excel.Worksheets.get_Item(1);

        Excel.Range startCell = wk.Cells[1, 1];
        Excel.Range endCell = wk.Cells[rowsCount, colsCount];
        Excel.Range currentRange = wk.get_Range(startCell, endCell).Cells;
        currentRange.Interior.Color = Excel.XlRgbColor.rgbWhite;

        object[,]  matrixRead = (object[,])currentRange.Value;
        bool[,] matrixResult = new bool[rowsCount+1,colsCount+1];

        for (int rows = 1; rows <= rowsCount; rows++)
        {
            for (int cols = 1; cols < colsCount; cols++)
            {
                if (matrixRead[rows,cols].ToString()==matrixRead[rows,cols+1].ToString())
                {
                    matrixResult[rows, cols] = true;
                    matrixResult[rows, cols + 1] = true;
                }
            }
        }

        for (int rows = 1; rows <= rowsCount; rows++)
        {
            for (int cols = 1; cols <= colsCount; cols++)
            {
                if (matrixResult[rows, cols])
                {
                    currentRange.Cells[rows, cols].interior.color = 
                                                Excel.XlRgbColor.rgbRed;
                }                
            }
        }

        excel.EnableAnimations = true;
        wkb.Close(true);
        excel.Quit();
        Console.WriteLine("Finished!");
    }

    private static Excel.Workbook Open(Excel.Application excelInstance, 
            string fileName, bool readOnly = false, 
            bool editable = true, bool updateLinks = true)
    {
        Excel.Workbook book = excelInstance.Workbooks.Open(
            fileName, updateLinks, readOnly,
            Type.Missing, Type.Missing, Type.Missing, Type.Missing, Type.Missing,
            Type.Missing, editable, Type.Missing, Type.Missing, Type.Missing,
            Type.Missing, Type.Missing);
        return book;
    }
}

About 95% of the speed (or much more by 3.6M cells) probably comes from these two lines:

excel.Visible = false; 
excel.EnableAnimations = false;

In general the two pairs of nested loops can be avoided and made into one pair quite easily - like this:

for (int rows = 1; rows <= rowsCount; rows++)
{
    for (int cols = 1; cols < colsCount; cols++)
    {
        if (matrixRead[rows,cols].ToString()==matrixRead[rows,cols+1].ToString())
        {
            currentRange.Cells[rows, cols].interior.color = Excel.XlRgbColor.rgbRed;
            currentRange.Cells[rows, cols+1].interior.color = Excel.XlRgbColor.rgbRed;
        }
    }
}

But the idea was to present the matrixResult as one range through Union() and update the background of the whole range at once. Thus, the second pair of nested loops should not exist, but instead it should be something like: currentRange.Interior.Color = Excel.XlRgbColor.rgbRed

Vityata
  • 42,633
  • 8
  • 55
  • 100
-1

try to read the whole excel file in dataset. then you can use below code to

Basic Example :

DataSet ResultDataSet= new DataSet();
DataTable customers = myDataset.Tables.Add("Customers");
customers.Columns.Add("Name");
customers.Columns.Add("Age");
customers.Rows.Add("Chris", "25");
int records=10;   // split the  10 records per table. if 50 records will be there then 5 tables will generate.

var splittedTables = ResultDataSet.AsEnumerable()
                                .Select((row, index) => new { row, index })
                                .GroupBy(x => x.index / records)
                                .Select(g => g.Select(x => x.row).CopyToDataTable())
                                .ToArray();

try this can work fast compare to foreach statement.

rene
  • 41,474
  • 78
  • 114
  • 152
  • 2
    Your answer is great without a link to your own website – rene Apr 04 '18 at 13:22
  • 1
    Just linking to your own library or tutorial is not a good answer. Linking to it, explaining why it solves the problem, providing code on how to do so and disclaiming that you wrote it makes for a better answer. See: [**What signifies “Good” self promotion?**](//meta.stackexchange.com/q/182212) – rene Apr 04 '18 at 13:23
  • Can you explain how to read the excel file into a dataset - I visited your website expecting to find some useful material but failed. – PaulF Apr 04 '18 at 13:35
  • hi, you can refer blog for refer the code. http://www.dotnetbasic.com/2017/11/split-tables-into-multiple-tables-using.html – dotnetbasic Apr 04 '18 at 13:35
  • That is just a copy of the code above - I am asking how to get the contents of the excel file into the Dataset in the first place. Also, how does splitting a table into multiple tables address the question the OP is asking? – PaulF Apr 04 '18 at 13:39
  • pls refer :https://stackoverflow.com/questions/18006318/how-to-import-all-the-excel-sheets-to-dataset-in-c-sharp – dotnetbasic Apr 04 '18 at 13:43
  • 1
    As OP needs to use Office interop to update the colours of cells, and interop can be used to read the worksheet directly into an array, what is the benefit of using a second technology (OleDb) to read the data into an intermediate DataSet which then needs converting to an array? You still haven't explained how splitting the dataset into multiple tables addresses what the OP is trying to do. – PaulF Apr 04 '18 at 14:00