0

I am reading data from numerous excel worksheets, but the performance is slow since I am fetching each col directly. Is there a way I can read the entire UsedRange into memory with one call? Then process the rows/cols locally?

The code I have is basically this:

xlWorkSheet = (Worksheet)_xlWorkBook.Worksheets.get_Item(1);
var range = xlWorkSheet.UsedRange;
for (var rCnt = 2; rCnt <= range.Rows.Count; rCnt++)
{
    // Process column entries
}
Camilo Terevinto
  • 31,141
  • 6
  • 88
  • 120
Tom Lindley
  • 351
  • 1
  • 3
  • 14
  • have you tried: range UsedRange { get;}: https://msdn.microsoft.com/en-us/library/microsoft.office.tools.excel.worksheet.usedrange.aspx?cs-save-lang=1&cs-lang=csharp#code-snippet-1 – Clyde Jul 22 '16 at 17:19
  • Are you working with the Excel 2007 (xlsx) format or the older binary one (xls)? If it's the newer one I would suggest that you look at http://epplus.codeplex.com to work with this instead of using the automation stuff, it should be way faster. – Karl-Johan Sjögren Jul 22 '16 at 17:30
  • Never mind. Found this right after I posted. I am just using: var array = (System.Array) range.Cells.Value; Working like a champ. Thanks. – Tom Lindley Jul 22 '16 at 17:48
  • BTW: this went from something like 15 minutes to process ~28k records to under a second. – Tom Lindley Jul 22 '16 at 17:49

1 Answers1

0

I had the same problem while handling very large excel I managed to read it as range and then transformed it to List> using AsParallel() on each row It made it to run much faster

Here is the code:

private List<List<string>> ReadExcelFile(string fileName)
    {
        Excel.Application xlApp = null;
        Workbook xlWorkbook = null;
        Sheets xlSheets = null;
        Worksheet xlSheet = null;
        var results = new List<List<string>>();

        try
        {
            xlApp = new Microsoft.Office.Interop.Excel.Application();

            xlWorkbook = xlApp.Workbooks.Open(fileName, Type.Missing, true, Type.Missing, Type.Missing, Type.Missing, true, XlPlatform.xlWindows, Type.Missing,false, false, Type.Missing, false, Type.Missing, Type.Missing);

            xlSheets = xlWorkbook.Sheets as Sheets;

            xlSheet = xlSheets[1];

            // Let's say your range is from A1 to DG5200
            var cells = xlSheet.get_Range("A1", "DG5200");

            results = ExcelRangeToListsParallel(cells); 
        }
        catch (Exception)
        {
            results = null;
        }
        finally
        {
            xlWorkbook.Close(false);
            xlApp.Quit();

            if (xlSheet != null)
                Marshal.ReleaseComObject(xlSheet);
            if (xlSheets != null)
                Marshal.ReleaseComObject(xlSheets);
            if (xlWorkbook != null)
                Marshal.ReleaseComObject(xlWorkbook);
            if (xlApp != null)
                Marshal.ReleaseComObject(xlApp);
            xlApp = null;                
        }

        return results;
    }

    private List<List<String>> ExcelRangeToListsParallel(Excel.Range cells)
    {            

        return cells.Rows.Cast<Excel.Range>().AsParallel().Select(row =>
        {         
            return row.Cells.Cast<Excel.Range>().Select(cell =>
            {
                var cellContent = cell.Value2;                    
                return (cellContent == null) ? String.Empty : cellContent.ToString(); 
            }).Cast<string>().ToList();                
        }).ToList();
    }
Oz Ben-David
  • 1,589
  • 1
  • 16
  • 26
  • Thanks, that seems like an elegant way to grab the data. – Tom Lindley Jul 23 '16 at 23:32
  • Hi Tom, So you can mark it as answer. BTW, another good approach is to relate to the excel file as DB, like here - http://stackoverflow.com/questions/15828/reading-excel-files-from-c-sharp?page=1&tab=votes#tab-top – Oz Ben-David Jul 25 '16 at 12:09