1

I'm reading in a very large Excel file into a VB.net application. The user specifies the file from a file picker at runtime. I'm currently using the Microsoft.Office.Interop.Excel.Application application interface to read in the Excel data cell-by-cell, row-by-row. Like this...

first = Excel.Cells(rowindex, FirstColumnIndex).value

I need to read big Excel files (80000 rows) and the current method takes a long time. I'm wondering if there is a faster way. I know it is possible to read in a file with ADO.net. Is this faster? Is there an even faster way?

Note Obviously, keeping so much data in an Excel file is not idea, but that's where the data that I need (some of it from other places) is being stored.

devuxer
  • 41,681
  • 47
  • 180
  • 292
bernie2436
  • 22,841
  • 49
  • 151
  • 244

3 Answers3

5

I would guess ADO.NET is faster in most circumstances, but I would not recommend it because it just doesn't work very well. There are a host of problems, such as the data types for the columns being determined by making a guess based on the first few rows, as opposed to some other more reliable method. You're also limited in terms of flexibility, such as how you'd like to handle empty cells or cells with #N/A.

As for "interop", I'm not crazy about that either because (1) you have to have Excel installed, (2) you now have to deal with the cumbersome Excel API, and (3) you now have to worry about properly disposing COM objects (otherwise, you'll end up with Excel processes hanging around long after you're done with your import--see the first three answers in How do I properly clean up Excel interop objects?). That said, if you choose to use interop, you can speed things up by using arrays rather than the one-cell-at-a-time method (see Can you paste a block of cells in one shot using Excel Interop w/o using the clipboard?).

The option I've started using recently is EPPlus. This open-source project has many advantages over the two methods you asked about. First, I believe it to be at least as fast as ADO.NET, if not faster. Second, it doesn't require you to have Excel installed because it reads from and writes to the .xlsx file directly. The one disadvantage is that it won't work for Excel documents saved in pre-2007 format. One other disadvantage is that the API for EPPlus is somewhat unusual in places, and it can be a little confusing. If you choose to go this route, leave me a comment, and I can provide some good sample code to get you started.

Community
  • 1
  • 1
devuxer
  • 41,681
  • 47
  • 180
  • 292
2

Thanks for all of the great answers. ADO.Net was way, way, way faster. Like 15 or 20 minutes faster. I posted here wondering if I should bother re-writing the code in ADO.net for minor speed ups. But, the answer is clearly that ADO is much, much faster than Office.Interop as a way to read in Excel files.

bernie2436
  • 22,841
  • 49
  • 151
  • 244
1

In case you are dealing with an Excel 2007 (or newer) file with the .xslx extension, you can also try the OpenXml SDK:

http://www.microsoft.com/download/en/details.aspx?id=5124

An example, even for large files, can be found here: http://blogs.msdn.com/b/brian_jones/archive/2010/05/27/parsing-and-reading-large-excel-files-with-the-open-xml-sdk.aspx

In case it's a Excel 1997 file (.xls extension) I can recommend this class from Codeproject: http://www.codeproject.com/Articles/14639/Fast-Excel-file-reader-with-basic-functionality

Markus Palme
  • 659
  • 4
  • 15