0

I have this c# program where I am able to export data to an excel file. It calls the stored procedure in sql server and returns the results to the excel file (based on the filter I choose).

Right now, I am able to export it successfully but if I try to extract way more data ( 20,000 records or so) from a 20 million record table, it is taking way way too long. I added a stopwatch to the code and found that the foreach loop is the culprit here.

Here is my code:

private void ExportExcel(SqlDataReader dr)
    {
        try
        {
            DataTable dt = new DataTable();
            dt.Load(dr);
            SaveFileDialog saveFileDialog1 = new SaveFileDialog();
            saveFileDialog1.Filter = "Microsoft Office Excel Workbook (*.xls)|*.xls|All Files (*.*)|*.*";
            saveFileDialog1.FilterIndex = 1;
            saveFileDialog1.RestoreDirectory = true;

            if (saveFileDialog1.ShowDialog() == DialogResult.OK)
            {
                // Create an Excel object and add workbook...
                Excel.ApplicationClass excel = new Excel.ApplicationClass();
                Excel.Workbook workbook = excel.Application.Workbooks.Add(true); // true for object template???


               // var watch = System.Diagnostics.Stopwatch.StartNew();
                // Add column headings...
                int iCol = 0;
                int iVisibleColumnCount = 0;
                foreach (DataColumn c in dt.Columns)
                {
                    iCol++;
                    // counting visible columns
                    if (c.ColumnMapping != MappingType.Hidden)
                        iVisibleColumnCount++;
                    else    // hide the columns in excel if the column is hide in datatable
                    {
                        ((Excel.Range)excel.Cells[1, iCol]).EntireColumn.Hidden = true;
                        continue;
                    }
                    // Set column header text to bold
                    ((Excel.Range)excel.Cells[1, iCol]).Font.Bold = true;
                    excel.Cells[1, iCol] = c.ColumnName;

                    if (c.DataType == typeof(System.String))
                        ((Excel.Range)excel.Cells[1, iCol]).EntireColumn.NumberFormat = "@";
                    else if (c.DataType == typeof(System.Int16)
                        || c.DataType == typeof(System.Int32)
                        || c.DataType == typeof(System.Int64))
                        ((Excel.Range)excel.Cells[1, iCol]).EntireColumn.NumberFormat = "#,##0";
                    else if (c.DataType == typeof(System.TimeSpan))
                        ((Excel.Range)excel.Cells[1, iCol]).EntireColumn.NumberFormat = @"[$-409]hh:mm:ss AM/PM;@";
                    else if (c.DataType == typeof(System.DateTime))
                        ((Excel.Range)excel.Cells[1, iCol]).EntireColumn.NumberFormat = "yyyy-mm-dd hh:mm:ss";
                    else if (c.DataType == typeof(System.Decimal))
                        ((Excel.Range)excel.Cells[1, iCol]).EntireColumn.NumberFormat = @"#,##0.00_);[Red](#,##0.00)";
                    else
                        ((Excel.Range)excel.Cells[1, iCol]).EntireColumn.NumberFormat = "General";
                }
                // for each row of data...
                int iRow = 0;
                foreach (DataRow r in dt.Rows)
                {
                    iRow++;

                    // add each row's cell data...
                    iCol = 0;
                    foreach (DataColumn c in dt.Columns)
                    {
                        iCol++;
                        if (c.ColumnMapping != MappingType.Hidden)
                        {
                            if (c.DataType == typeof(DateTime))
                            {
                                DateTime date1 = (DateTime)r[c.ColumnName];
                                string DateTime = date1.ToString();

                                if (DateTime.Contains("AM"))
                                {
                                    excel.Cells[iRow + 1, iCol] = date1.ToString("yyyy-MM-dd hh:mm:ss") + " AM";
                                }
                                else   {
                                    excel.Cells[iRow + 1, iCol] = date1.ToString("yyyy-MM-dd hh:mm:ss") + " PM";
                                }


                            }
                            else {
                                excel.Cells[iRow + 1, iCol] = r[c.ColumnName].ToString();
                            }
                        }
                    }
                }

Can anyone give some suggestion on how to improve the performance it takes to export to excel?

Venkat
  • 2,549
  • 2
  • 28
  • 61
Lord Jesus
  • 81
  • 8
  • Which framework do you use to create and write to the sheet? – Magnus Apr 01 '18 at 12:43
  • You should start with filtering out data before reading it to the system the conditions like (c.ColumnMapping != MappingType.Hidden) and (c.DataType == typeof(DateTime) should be taken to db level so that you will have less data to be processed and this condition wont be required. simply think of reducing the cyclomatic complexity – Guruparan Giritharan Apr 01 '18 at 12:43
  • @Magnus This is an old application and it's targeting 2.0 Net framework. – Lord Jesus Apr 01 '18 at 12:44
  • I mean what is `excel`? Is it an external library or automation or other? – Magnus Apr 01 '18 at 12:45
  • @Guru Makes sense but the foreach( DataColumn c in dt.Columns) depends on those condition as well. So would that still be possible? – Lord Jesus Apr 01 '18 at 12:45
  • @Magnus Oh, the excel is an external library DLL that I add reference to so I can export the data to excel. My code is like this: Excel.ApplicationClass excel = new Excel.ApplicationClass(); – Lord Jesus Apr 01 '18 at 12:47
  • I edited more code to include the whole function. – Lord Jesus Apr 01 '18 at 12:48
  • Interop with an out-of-process server is never cheap, the round-trip time for a simple assignment can kill perf dead. Don't just update one cell at a time, update an entire range with an array. Many existing Q+A about it, like [this one](https://stackoverflow.com/q/17062599/17034). Getting sloppy with an OpenXML-based library is not generally a problem, like ClosedXML. – Hans Passant Apr 01 '18 at 12:54
  • Most likely that excel library is you bottleneck. I usually use [Aspose.Cells](https://products.aspose.com/cells/net) for this purpose. – Magnus Apr 01 '18 at 12:55
  • @Magnus How good is Aspose in terms of performance? – Lord Jesus Apr 01 '18 at 13:03
  • I never had any performance issues with it. – Magnus Apr 01 '18 at 13:06
  • I can't seem to install it in Package Manager. – Lord Jesus Apr 01 '18 at 13:09

2 Answers2

3

How are you accessing Excel - via COM interop?

If so, you'll want to avoid setting each cell's value individually, because that is very slow indeed. Try to assign arrays to ranges as large as you possibly can.

For example, since you seem to iterate over each row unconditionally, it looks like you could at least try build an array/vector representing one column, and assigning it to an appropriate range in that column.

Bogey
  • 4,926
  • 4
  • 32
  • 57
  • Ye I am using an Excel.DLL to access this function – Lord Jesus Apr 01 '18 at 12:53
  • Can you tell me which part of my code is the MAIN issue? – Lord Jesus Apr 01 '18 at 12:53
  • 2
    Excel.DLL isn't quite specific enough - there are interop libraries (official Excel interop, Netoffice etc.) working with currently open workbooks, and then stuff such as OpenXML/ClosedXML that work on the file format themselves. In any case, I'd suspect your culprit is basically every call to excel.Cells[iRow + 1, iCol] = .... try to optimize your loop so you set entire ranges instead, e.g. build up an array representing all values in a given column, and then do something like excel.Range[ ... whatever the location/dimension of your target range...] = yourArray – Bogey Apr 01 '18 at 12:55
  • yea i guessed it's office.interop.owc11 – Lord Jesus Apr 01 '18 at 12:58
1

There’s no obvious way to improve this. If the loop is taking the time (rather than the sql procedure) then the majority of the work must be being done by the excel library. You could look into different libraries for excel, and if you’re lucky one might allow building rows in parallel, or just be faster. Otherwise you’re limited to making small optimisations on your own code inside the loop. You could probably improve that date time code a bit, but I’d be surprised if it would make much difference.

Do you need an excel sheet or would a csv (which can be opened by excel) do? If a csv would work then it might be possible to export from sql directly or build the rows in parallel yourself without relying on libraries.

gandaliter
  • 9,863
  • 1
  • 16
  • 23
  • A CSV would do as well. Here's the thing, when I export and filter the dates ( say 3 years ago to today) and there's only about 20k records, it won't even load finish. Although I think improving the loop won't make a BIG improvement towards the performance. Even as of now, it takes 4-5 mins just to export 1 month of data which contains less than 2,000. I looked up to external library like EPPlus which helps export data faster but the problem is it only works with net framework 3.5 and later and my app is using 2.0 so that's gonna be a huge problem – Lord Jesus Apr 01 '18 at 12:52
  • A csv you can create yourself without any external library. – Magnus Apr 01 '18 at 13:05
  • But i don't think it will be a huge improvement. In my scenario, I have to be looking for a big improvement. – Lord Jesus Apr 01 '18 at 13:45
  • If that loop is taking most of the time then writing csvs yourself using string tools probably will be a big improvement – gandaliter Apr 01 '18 at 14:00