10

I am using Microsoft.Office.Interop.Excel to read a spreadsheet that is open in memory.

gXlWs = (Microsoft.Office.Interop.Excel.Worksheet)gXlApp.ActiveWorkbook.ActiveSheet;
int NumCols = 7;
string[] Fields = new string[NumCols];
string input = null;
int NumRow = 2;
while (Convert.ToString(((Microsoft.Office.Interop.Excel.Range)gXlWs.Cells[NumRow, 1]).Value2) != null)
{
    for (int c = 1; c <= NumCols; c++)
    {
        Fields[c-1] = Convert.ToString(((Microsoft.Office.Interop.Excel.Range)gXlWs.Cells[NumRow, c]).Value2);
    }
    NumRow++;

    //Do my other processing
}

I have 180,000 rows and this turns out be very slow. I am not sure the "Convert" is efficient. Is there anyway I could do this faster?

Moon

ManInMoon
  • 6,795
  • 15
  • 70
  • 133
  • Do you know if you're IO bound or CPU bound? You could probably parallelize this code on a multi core/cpu machine. – Roman Sep 30 '11 at 17:54
  • You can also do SQL request on an Excel spreadsheet using ADO.NET. This way you can avoid doing the While loop. See https://stackoverflow.com/questions/6789648/query-excel-sheet-in-c-sharp – Maxter Aug 21 '18 at 20:37

6 Answers6

31

Hi I found a very much faster way.

It is better to read the entire data in one go using "get_range". This loads the data into memory and I can loop through that like a normal array.

Microsoft.Office.Interop.Excel.Range range = gXlWs.get_Range("A1", "F188000");
object[,] values = (object[,])range.Value2;
int NumRow=1;
while (NumRow < values.GetLength(0))
{
    for (int c = 1; c <= NumCols; c++)
    {
        Fields[c - 1] = Convert.ToString(values[NumRow, c]);
    }
    NumRow++;
}
ManInMoon
  • 6,795
  • 15
  • 70
  • 133
  • what is NumRow++ means? – Haminteu Aug 04 '14 at 08:30
  • NumRow++ is the same as NumRow = NumRow + 1; – Lereveme Jun 16 '15 at 14:01
  • You have used "F188000". In my case I don't know how many rows would be there. What should I do? – user728630 Sep 17 '16 at 02:06
  • You can speed up the conversion using `Array.Copy()`. See http://stackoverflow.com/a/5083690/380384. Of course it depends what the cell contents are. Are they numbers or strings? – John Alexiou Oct 14 '16 at 17:41
  • Note that the resulting 2-D array is (oddly) 1-based rather than 0-based, despite the fact that it looks like a standard C# array. But yes, this is orders of magnitude faster than iterating through the underlying cells - thanks. – Ken Smith Jan 03 '17 at 18:32
  • Absolutely fantastic... seriously, code that took minutes is done in the blink of an eye. – Hambone Oct 06 '17 at 17:22
  • 2
    @ user728630 Another method of getting the data range is: `Excel.Excel.Range DataRange = ExcelWorksheet.UsedRange; object[,] ExcelRangeValues = (object[,])DataRange.Value2; var RowCount = DataRange.Rows.Count + 1; var ColumnCount = DataRange.Columns.Count + 1;` – Hex Mar 20 '18 at 10:30
4

There are several options - all involve some additional library:

  • OpenXML 2.0 (free library from MS) can be used to read/modify the content of an .xlsx so you can do with it what you want

  • some (commercial) 3rd-party libraries come with grid controls allowing you to do much more with excel files in your application (be it Winforms/WPF/ASP.NET...) like SpreadsheetGear, Aspose.Cells etc.

Yahia
  • 69,653
  • 9
  • 115
  • 144
  • But is it likely to be any faster? – ManInMoon Sep 30 '11 at 17:36
  • 2
    YES - that's why I proposed it, it accesses the file without using Excel and without the overhead of COM/Interop – Yahia Sep 30 '11 at 17:37
  • But I want to access a spreadsheet that is OPEN in memory. So that I can change it interactively and then run my code again. I think what you are suggesting is reading the saved file from disk - or am I wrong? – ManInMoon Sep 30 '11 at 17:41
  • 1
    @ManInMoon - Then you will have to accept that reading the file that is open in memory is going to have performance problems. You didn't indicate how long it takes to read 180,000 records. – Security Hound Sep 30 '11 at 17:47
1

I found really fast way to read excel in my specific way. I need to get it as a two dimensional array of string. With really big excel, it took about one hour in old way. In this way, I get my values in 20sec.

I am using this nugget: https://reposhub.com/dotnet/office/ExcelDataReader-ExcelDataReader.html

And here is my code:

DataSet result = null;
//https://reposhub.com/dotnet/office/ExcelDataReader-ExcelDataReader.html
using (var stream = File.Open(path, FileMode.Open, FileAccess.Read))
{
    // Auto-detect format, supports:
    //  - Binary Excel files (2.0-2003 format; *.xls)
    //  - OpenXml Excel files (2007 format; *.xlsx)
    using (var reader = ExcelReaderFactory.CreateReader(stream))
    {
        result = reader.AsDataSet();
    }
}

foreach (DataTable table in result.Tables)
{
    if (//my conditions)
    {
        continue;
    }

    var rows = table.AsEnumerable().ToArray();

    var dataTable = new string[table.Rows.Count][];//[table.Rows[0].ItemArray.Length];
    Parallel.For(0, rows.Length, new ParallelOptions { MaxDegreeOfParallelism = 8 },
        i =>
        {
            var row = rows[i];
            dataTable[i] = row.ItemArray.Select(x => x.ToString()).ToArray();                                    
        });

    importedList.Add(dataTable);
}
camille
  • 16,432
  • 18
  • 38
  • 60
1

I am not sure the "Convert" is efficient. Is there anyway I could do this faster?

What makes you believe this? I promise you that Convert.ToString() is the most effective method in the code you posted. Your problem is that your looping through 180,000 records in an excel document...

You could split the work up since you know the number of row this is trival to do.

Why are you coverting Value2 to a string exactly?

Security Hound
  • 2,577
  • 3
  • 25
  • 42
  • 1
    @ManInMoon - Explain what you mean by that statement. You do understand your String[] can only only 7 Strings right? So your code doesn't make a great deal of sense if your reading 180,000 records. – Security Hound Sep 30 '11 at 17:45
  • WHere I have put "//Do my other procoessing" I deal with each "row" separately – ManInMoon Oct 01 '11 at 04:44
0

Use the OleDB Method. That is the fastest as follows;

string con =
  @"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=D:\temp\test.xls;" + 
  @"Extended Properties='Excel 8.0;HDR=Yes;'";    
using(OleDbConnection connection = new OleDbConnection(con))
{
    connection.Open();
    OleDbCommand command = new OleDbCommand("select * from [Sheet1$]", connection); 
    using(OleDbDataReader dr = command.ExecuteReader())
    {
         while(dr.Read())
         {
             var row1Col0 = dr[0];
             Console.WriteLine(row1Col0);
         }
    }
}
Zeeshan Ahmad Khalil
  • 793
  • 1
  • 12
  • 29
0

I guess it's not the Convert the source of "slowing"...

Actually, retrieving cell values is very slow.

I think this conversion is not necessary:

(Microsoft.Office.Interop.Excel.Range)gXlWs

It should work without that.

And you can ask directly:

gXlWs.Cells[NumRow, 1].Value != null

Try to move the entire range or, at least, the entire row to an object Matrix and work with it instead of the range itself.

masaishi
  • 21
  • 3
  • 2
    While you make good points none of that would increase the speed of his code. The conversion is not slowing down his code. – Security Hound Sep 30 '11 at 17:42
  • Would reading the entire range in (a matrix of 180,000,7) be faster? I have seen a ref to this with .get_range but I am not sure how you would use it - any clues would be great. – ManInMoon Sep 30 '11 at 18:34
  • @ManInMoon - It wouldn't be faster, it would make more sense, you are still going to limited by how you are reading the file. – Security Hound Sep 30 '11 at 18:50
  • It COULD be faster. If there is overhead to every "read" from the OPEN workbook. I just don't know how to use it. – ManInMoon Oct 01 '11 at 04:46
  • Well, the solution you showed above seems to be what I suggested. Do you agree? – masaishi Oct 04 '11 at 17:12