6

I am writing an application to open an Excel sheet and read it

MyApp = new Excel.Application();
MyBook = MyApp.Workbooks.Open(filename);
MySheet = (Excel.Worksheet)MyBook.Sheets[1]; // Explict cast is not required here
lastRow = MySheet.Cells.SpecialCells(Excel.XlCellType.xlCellTypeLastCell).Row;
MyApp.Visible = false;

It takes about 6-7 seconds for this to take place, is this normal with interop Excel?

Also is there a quicker way to Read an Excel than this?

string[] xx = new string[lastRow];
for (int index = 1; index <= lastRow; index++)
{
   int maxCol = endCol - startCol;
   for (int j = 1; j <= maxCol; j++)
   {
      try
      {
         xx[index - 1] += (MySheet.Cells[index, j] as Excel.Range).Value2.ToString();
      }
      catch
      {    
      }

      if (j != maxCol) xx[index - 1] += "|";
   }
}
MyApp.Quit();
System.Runtime.InteropServices.Marshal.ReleaseComObject(MySheet);
System.Runtime.InteropServices.Marshal.ReleaseComObject(MyBook);
System.Runtime.InteropServices.Marshal.ReleaseComObject(MyApp);
DIF
  • 2,470
  • 6
  • 35
  • 49
SAEED
  • 79
  • 1
  • 7

4 Answers4

5

Appending to the answer of @RvdK - yes COM interop is slow.

Why is it slow?

It is due to the fact how it works. Every call made from .NET must be marshaled to local COM proxy from there it must be marshaled from one process (your app) to the COM server (Excel) (through IPC inside Windows kernel) then it gets translated (dispatched) from the server's local proxy into a native code where arguments get marshaled from OLE Automation compatible types into native types, their validity checked and the function is performed. Result of the function travels back approximately same way through several layers between 2 different processes.

So each and every command is quite expensive to execute, the more of them you do the slower the whole process is. You can find lots of documentation all around the web as COM is old and well working standard (somehow dying with Visual Basic 6).

One example of such article is here: http://www.codeproject.com/Articles/990/Understanding-Classic-COM-Interoperability-With-NE

Is there a quicker way to read?

  1. ClosedXML can both read and write Excel xlsx files (even formulas, formatting and stuff) using Microsoft's OpenXml SDK, see here: https://closedxml.codeplex.com/wikipage?title=Finding%20and%20extracting%20the%20data&referringTitle=Documentation

  2. Excel data reader claims to be able to read both legacy and new Excel data files, I did not try it myself, take a look here: https://exceldatareader.codeplex.com/

  3. another way to read data faster is to use Excel automation to translate sheet into a data file that you can understand easily and batch process without the interop layer (e.g. XML,CSV). This answer shows how to do it

Community
  • 1
  • 1
xmojmr
  • 8,073
  • 5
  • 31
  • 54
  • 1
    This certainly warrants more upvotes. This confirmed some of my suspicions as to the COM workflow with Interop assemblies. I now know why 88.6% of my runtime is spent in these calls to Excel. Tracing performance issues in this scenario is not easy. – Anthony Mason Feb 23 '17 at 20:57
  • 1
    You'd think a giant company like Microsoft - especially given the demand for reading/writing Excel files - could afford to produce decent tools for same beyond: (1) clunky COM (2) Db drivers that screw with the data and provide little control over the import (3) Obtuse OpenXML SDK. They really should be embarrassed by themselves. – B H Oct 30 '18 at 06:29
3

Short answer: correct, interop is slow. (had the same problem, taking couple of seconds to read 300 lines...

Use a library for this:

RvdK
  • 19,580
  • 4
  • 64
  • 107
  • 3
    not an answer to me... Why is it slow? Maybe you can't properly use it? Reading an Excel file using COM and iterating just sounds like a bad idea to me to start with.. `Range`s are 2D arrays, you "import" an entire range into an Array and than you work with it.. Using 3rd party libraries isn't going to improve by much if you're still iterating.... –  Apr 22 '14 at 07:25
  • I agree this shouldn't be an "answer". But I love EPPlus. Quick and easy to use. – mason Apr 22 '14 at 16:25
2

This answer is only about the second part of your question. Your are using lots of ranges there which is not as intended and indeed very slow.

First read the complete range and then iterate over the result like so:

var xx[,] = (MySheet.Cells["A1", "XX100"] as Excel.Range).Value2;
for (int i=0;i<xx.getLength(0);i++)
{
    for (int j=0;j<xx.getLength(1);j++)
    {
         Console.WriteLine(xx[i,j].toString());
    }
}

This will be much faster!

Martin Meeser
  • 2,784
  • 2
  • 28
  • 41
  • 1
    I agree, read an entire range to a 2D array, but please, do not iterate like this just to show results ... Convert that 2d array to an Enumerable list and print it all in one go... –  Apr 22 '14 at 07:28
1

You can use this free library, xls & xlsx supported,

Workbook wb = new Workbook();
wb.LoadFromFile(ofd.FileName);

https://freenetexcel.codeplex.com/

hark.den
  • 11
  • 1