0

I'm using C# to pull data from an Excel file. I need to get the text and some minor formatting data from a sheet. My test sheet has 115 rows and 10 columns. The performance seems sluggish. If I only pull out the text using the code below it takes about 2 seconds to run. If I check the font (in the if(c.Font.Bold==null..... line) it goes up to 8 seconds. If I get the borders info then it goes up to 17 seconds.

The problem is that I'll have many, many sheets I need to pull data from and speed will become an issue. Any suggestions on what I can do to speed this up? I really appreciate any help.

foreach (Range c in oSheet.UsedRange.Cells)
{
    var txt = c.Text;
    if (c.Font.Bold == null || c.Font.Italic == null || Convert.ToInt32(c.Font.Underline) > 0 || Convert.ToBoolean(c.Font.Bold) || Convert.ToBoolean(c.Font.Italic))
        txt = "";

    var borderBottom = c.Borders.Item[Microsoft.Office.Interop.Excel.XlBordersIndex.xlEdgeBottom].LineStyle;
    var borderTop = c.Borders.Item[Microsoft.Office.Interop.Excel.XlBordersIndex.xlEdgeTop].LineStyle;
    var borderLeft = c.Borders.Item[Microsoft.Office.Interop.Excel.XlBordersIndex.xlEdgeLeft].LineStyle;
    var borderRight = c.Borders.Item[Microsoft.Office.Interop.Excel.XlBordersIndex.xlEdgeRight].LineStyle;
}
Mohammad Dehghan
  • 17,853
  • 3
  • 55
  • 72
nickfinity
  • 1,119
  • 2
  • 15
  • 29

3 Answers3

3

If your Excel file is a Excel 2007/2010 file (.xlsx), you can use ExcelPackage or EPPlus components to read the file. They are mush faster that office interop.

I used EPPlus and it iterated over 2000 cell almost instantly!

ExcelPackage ep = new ExcelPackage(new FileStream(path, FileMode.Open, FileAccess.Read));
var sheet = ep.Workbook.Worksheets[1];
foreach (var cell in sheet.Cells[sheet.Dimension.Address])
{
    var txt = cell.Text;
    var font = cell.Style.Font;
    if (!font.Bold || font.Italic || font.UnderLine)
        txt = "";
    var borderBottom = cell.Style.Border.Bottom.Style;
    var borderTop = cell.Style.Border.Top.Style;
    var borderLeft = cell.Style.Border.Left.Style;
    var borderRight = cell.Style.Border.Right.Style;
    // ...
}
Mohammad Dehghan
  • 17,853
  • 3
  • 55
  • 72
  • Thanks. It took less than 1/10 of a second to pull in the data I need. Well, not quite all of it because of some of the differences, but I'll see if I can use EPPlus instead. Pretty amazing speed difference. – nickfinity Feb 04 '13 at 17:19
  • Quick question for you. Do you know if there is a way to check the font characteristics of individual characters using EPPlus? – nickfinity Feb 04 '13 at 19:40
  • @nickfinity Yes, it is easy. But please ask another question for that, so other users with same question can find the answer easily. By doing this you also earn more comminuty reputation and more answers from the community. – Mohammad Dehghan Feb 05 '13 at 06:46
  • 1
    @nickfinity Since you didn't ask any question, I try to answer your last question here :) Use `if( cell.IsRichText )foreach (var rt in cell.RichText){...}`. Then you have `rt.Text`, `rt.FontName`, `rt.Bold`, etc. You see why you should ask a new question :-) – Mohammad Dehghan Feb 05 '13 at 18:00
  • I was playing around with it here to see if I could figure it out first. What you posted it what I came up with, but it's not quite working. It says it's richtext, but then bold is always false. I'll post a question later if I can't figure it out. Thanks for the help. – nickfinity Feb 05 '13 at 18:26
1

I'm not at all familiar with C#, but in VBA I use Application.ScreenUpdating property set to false on the start and set back to true when finished. In general case this dramatically increases speed, especially if macro performs any visible sheets updates.

I'm pretty sure such property should be available in C# as well. Hope that was helpful)

Peter L.
  • 7,276
  • 5
  • 34
  • 53
  • Thanks for the help. I didn't know about that setting, but it didn't seem to make a difference. I am setting Application.Visible=false, so maybe that overrides the ScreenUpdating setting. – nickfinity Feb 04 '13 at 14:33
  • @nickfinity frankly, I'm not sure which overrides which, and there are no clear suggestions about it in google) perhaps someone more experienced will bring the light on that. – Peter L. Feb 04 '13 at 15:13
  • @nickfinity 1 more suggestion - use arrays for input/output of large data sets instead of operating individual items. Read more (as an example): http://www.ozgrid.com/forum/showthread.php?t=46525 – Peter L. Feb 04 '13 at 15:16
  • @nickfinity And 1 more thing: http://stackoverflow.com/a/14663999/1953175 - do not neglect `Range.Borders(xlInsideVertical)` and `Range.Borders(xlInsideHorizontal)` properties. That's much faster than draw all borders for the set of individual cells) – Peter L. Feb 04 '13 at 15:25
  • Thanks for the help. I'm not quite sure I'm following you on the arrays. I'm getting the data for the UsedRange. – nickfinity Feb 04 '13 at 17:32
  • @nickfinity this is general approach, perhaps not for your case - but still actual. 2 months ago I had to fetch regularly web table about 2k rows x 10-15 cols, so the difference in speed between range and cell-by cell output was dramatic. – Peter L. Feb 04 '13 at 17:40
1

You could use the below steps. This is very fast and one line code ( no need of loops and all). I am taking a simple excel to explain here :

Before

enter image description here

I managed to store the range as A1:C4 in a variable dynamically in exRange and used the below code to give border

((Range)excelSheet.get_Range(exRange)).Cells.Borders.LineStyle = XlLineStyle.xlContinuous;


After

enter image description here

Sarath Subramanian
  • 20,027
  • 11
  • 82
  • 86