63

I was wondering what I could do to improve the performance of Excel automation, as it can be quite slow if you have a lot going on in the worksheet...

Here's a few I found myself:

  • ExcelApp.ScreenUpdating = false -- turn off the redrawing of the screen

  • ExcelApp.Calculation = Excel.XlCalculation.xlCalculationManual -- turning off the calculation engine so Excel doesn't automatically recalculate when a cell value changes (turn it back on after you're done)

  • Reduce calls to Worksheet.Cells.Item(row, col) and Worksheet.Range -- I had to poll hundreds of cells to find the cell I needed. Implementing some caching of cell locations, reduced the execution time from ~40 to ~5 seconds.

What kind of interop calls take a heavy toll on performance and should be avoided? What else can you do to avoid unnecessary processing being done?

Jeremy Thompson
  • 61,933
  • 36
  • 195
  • 321
Vincent Van Den Berghe
  • 5,425
  • 2
  • 31
  • 40
  • 3
    +1 I also have performance problems using the Excel Interop, and I have learned something new: `ExcelApp.Calculation = Excel.XlCalculation.xlCalculationManual`. Thanks! =) – Will Marcouiller Mar 04 '11 at 18:33
  • Thank you for sharing your current finds in the question, very useful. – Alex Feb 17 '16 at 15:04
  • some of the other possible options are here http://stackoverflow.com/documentation/excel-vba/1107/vba-best-practices/5925/switch-off-properties-during-macro-execution – Slai Feb 12 '17 at 01:51
  • I know this is old but there here is an option. Convert your data into csv file, save as xlsx, then make all your modifications to the existing Workbook. I had 6k records 131 columns in under 2s. – Train Oct 06 '22 at 22:55

7 Answers7

50

When using C# or VB.Net to either get or set a range, figure out what the total size of the range is, and then get one large 2 dimensional object array...

//get values
object[,] objectArray = shtName.get_Range("A1:Z100").Value2;
iFace = Convert.ToInt32(objectArray[1,1]);

//set values
object[,] objectArray = new object[3,1] {{"A"}{"B"}{"C"}};
rngName.Value2 = objectArray;

Note that its important you know what datatype Excel is storing (text or numbers) as it won't automatically do this for you when you are converting the type back from the object array. Add tests if necessary to validate the data if you can't be sure beforehand of the type of data.

Anonymous Type
  • 3,051
  • 2
  • 27
  • 45
  • +1 For using 2D object arrays. There's also the `shtName.UsedRange.get_Value(XlRangeValueDataType.XlRangeValueDefault)` that can be type-casted to a two dimensional object array and will retrieve all the cells' values at once. – Will Marcouiller Mar 04 '11 at 18:36
  • 2
    @Will Marcouiller: Yes but the problem with using the UsedRange property is that it has a negative performance impact. I'm not sure its even that much faster than using cells/offset methods. – Anonymous Type Mar 06 '11 at 21:11
  • That's great, but how can one add images to xls using this optimized technique ? In my application images insertion is a bottleneck. Each generated XLSX file has at least 300-400 images. The current solution calls `worksheet.Shapes.AddPicture()` method as many times as the number of pictures. This is really slow. – mr_esp Oct 13 '13 at 19:02
  • If you're needing additional functionality (like formatting), which performs horrible using the COM object, you can try EPPlus from this answer - http://stackoverflow.com/questions/6583136/how-can-i-speed-up-creating-excel-excel-spreadsheets-from-c – Jeff Fischer Feb 03 '15 at 03:27
  • 1
    @JeffFischer OP has "a lot going on in worksheet". The EPPlus recommendation is out of touch with what we're talking about. – Jeremy Thompson Feb 03 '15 at 10:45
13

This is for anyone wondering what the best way is to populate an excel sheet from a db result set. This is not meant to be a full list by any means but it does list a few options.

Some performance numbers while attempting to populate an excel sheet with 155 columns and 4200 records on an old Pentium 4 3GHz box including data retrieval time which was never more than 10 seconds in order of slowest to fastest is as follows...

  1. One cell at a time - Just under 11 minutes

  2. Populating a dataset by converting to html + Saving html to disk + Loading html into excel and saving worksheet as xls/xlsx - 5 minutes

  3. One column at a time - 4 minutes

  4. Using the deprecated sp_makewebtask procedure in SQL 2005 to create an HTML file - 9 Seconds + Followed by loading the html file in excel and saving as XLS/XLSX - About 2 minutes.

  5. Convert .Net dataset to ADO RecordSet and use the WorkSheet.Range[].CopyFromRecordset function to populate excel - 45 seconds!

I ended up using option 5. Hope this helps.

aphoria
  • 19,796
  • 7
  • 64
  • 73
Ritesh
  • 161
  • 1
  • 4
  • hmmm nice hybrid approach in (5) However have you tested a sixth approach, using an OLEDbConnection to the workbook and populating the sheet as a table? limitations with this approach include needing to know the schema of each column beforehand (to prevent incorrect type conversion by excel). – Anonymous Type Jan 18 '12 at 22:54
  • 1
    @AnonymousType - I must admit that I didn't try populating the sheet as a table. I still wish that Microsoft had provided a "CopyFromDataSet" function for .Net devs :-). – Ritesh Jun 12 '12 at 14:01
  • that would have been a very sweet VSTO feature. – Anonymous Type Jun 13 '12 at 00:34
  • I know this is old but there is an option 6. Convert your data into csv file, save as xlsx, then make all your modifications to the existing WB. I had 6k records 131 columns in under 2s. Multiple sheets means each sheet is a csv. – Train Oct 06 '22 at 22:54
6

If you're polling values of many cells you can get all the cell values in a range stored in a variant array in one fell swoop:

Dim CellVals() as Variant
CellVals = Range("A1:B1000").Value

There is a tradeoff here, in terms of the size of the range you're getting values for. I'd guess if you need a thousand or more cell values this is probably faster than just looping through different cells and polling the values.

Jon Fournier
  • 4,299
  • 3
  • 33
  • 43
  • 5
    -1, no offense to Jon, but how did this get upvoted so much? The question is talking about Excel Interop not VBA/VB6. pretty sure Variant doesn't even exist as a type from COM interop (uses object). – Anonymous Type Mar 06 '11 at 21:12
  • 1
    Range.Value works for COM too. and therefore the point if its example for VBA/.NET is kinda irrelevant here, and doesn't change the answer. i'd agree that it was better if Jon would change the sample. – Tomer W Jul 21 '16 at 20:59
  • Not to mention this answer was here a year before the accepted answer, and that one is only slightly better since it has the reverse operation. – Chris Jul 24 '18 at 18:27
5

Use excels builtin functionality whenever possible, for example: Instead of searching a whole column for a given string, use the find command available in the GUI by Ctrl-F:

Set Found = Cells.Find(What:=SearchString, LookIn:=xlValues, _
    SearchOrder:=xlByRows, SearchDirection:=xlNext, _
    MatchCase:=False, SearchFormat:=False)

If Not Found Is Nothing Then
    Found.Activate
    (...)
EndIf

If you want to sort some lists, use the excel sort command, don't do it manually in VBA:

Selection.Sort Key1:=Range("A1"), Order1:=xlAscending, Header:=xlGuess, _
    OrderCustom:=1, MatchCase:=False, Orientation:=xlTopToBottom, _
    DataOption1:=xlSortNormal
Treb
  • 19,903
  • 7
  • 54
  • 87
4

As Anonymous Type says: reading/writing large range blocks is very important to performance.

In cases where the COM-Interop overhead is still too large you may want to switch to using the XLL interface, which is the fastest Excel interface.

Although the XLL interface is primarily meant for C++ users, both XL DNA and Addin Express provide .NET to XLL bridge capability which is significantly faster than COM-Interop.

Charles Williams
  • 23,121
  • 5
  • 38
  • 38
2

Performance also depends a lot on how you automate Excel. VBA is faster than COM automation is faster than .NET automation. And typically early (compile time) binding is faster than late binding, too.

If you have serious performance problems you could think of moving the critical parts of the code to a VBA module and call that code from your COM/.NET automation code.

If you use .NET you should also use the optimized primary interop assemblies available from Microsoft and not use custom-built interop assemblies.

Dirk Vollmar
  • 172,527
  • 53
  • 255
  • 316
  • 4
    all true. but if you follow the advice of not making lots of little Range object calls to set Value2 property, and just pass in an Object array or get one, you won't really need to use VBA. – Anonymous Type Feb 19 '10 at 04:28
0

Another big thing you can do in VBA is to use Option Explicit and avoid Variants wherever possible. Variants are not 100% avoidable in VBA, but they make the interpreter do more work at runtime and waste memory.

I found this article very helpful when I was starting with VBA in Excel.
http://www.ozgrid.com/VBA/SpeedingUpVBACode.htm

And this book

http://www.amazon.com/VB-VBA-Nutshell-Language-OReilly/dp/1565923588

Similar to

 app.ScreenUpdates = false //and
 app.Calculation = xlCalculationManual

you can also set

 app.EnableEvents = false //Prevent Excel events
 app.Interactive = false  //Prevent user clicks and keystrokes

although they don't seem to make as big a difference as the first two.

Similar to setting Range values to arrays, if you are working with data that is mostly tables with the same formula in every row of a column, you can use R1C1 formula notation for your formula and set an entire column equal to the formula string to set the whole thing in one call.

app.ReferenceStyle = xlR1C1
app.ActiveSheet.Columns(2) = "=SUBSTITUTE(C[-1],"foo","bar")"

Also, creating XLL add-ins using ExcelDNA & .NET (or the hard way in C) is also the only way you can get UDFs to run on multiple threads. (See Excel DNA's ExcelFunction attribute's IsThreadSafe property.)

Before I transitioned to Excel DNA completely, I also experimented with creating COM visible libraries in .NET to reference in VBA projects. Heavy text processing is a bit faster than VBA that way, as are using wrapped .NET List classes instead of VBA's Collection, but Excel DNA is better.

JamesFaix
  • 8,050
  • 9
  • 37
  • 73