0

So I'm trying to read an excel file with C# and the document is 181MB. I have tried using Microsoft.Office.Interop.Excel, OpenXML, ClosedXML, and ExcelDataReader. I wasn't able to get OpenXML to work and ClosedXML seems to have issues with large excel file (it also takes at least 6 minutes to read the file). I like ExcelDataReader the most since I can read the data table like an array but it does take 4-5 minutes to read the file which is much faster than Interlop, but it's still a long wait. I'm considering converting the excel document into a csv file, but when I did that the size went from 181 MB to 248 MB so I'm unsure if it will be more efficient. It also forces the users to do an extra step to convert their files into a csv, but if the performance is worth it I will attempt this route.

Unfortunately, I am not able to pre-determine how many columns and rows the excel document will have as the users will be using openFileDialog to select a file.

Is ExcelDataReader the best way to go or is there a better solution?

Here's my current code in case there's some improvements I can make:

OpenFileDialog openFileDialog = new OpenFileDialog();
openFileDialog.Filter = "Excel Files|*.xls;*.xlsx;*.slxm";
if (openFileDialog.ShowDialog() == true)
{
    using (var stream = File.Open(openFileDialog.FileName, FileMode.Open, FileAccess.Read))
    {
        using (var reader = ExcelReaderFactory.CreateReader(stream))
        {
            //results will be in dataSet.Tables
            var dataSet = reader.AsDataSet();
            var dataTable = dataSet.Tables[0];

            int r = 0;
            for(int c = 0; c < dataTable.Columns.Count; c += 3)
            {
                TagListData.Add(new TagClass { IsTagSelected = false, TagName = dataTable.Rows[r][c].ToString(), rIndex = r, cIndex = c });
            }                    
        }
    }
    GC.Collect();
    GC.WaitForPendingFinalizers();
    GC.Collect();
}
kelvinmac
  • 191
  • 1
  • 15
Jummi
  • 123
  • 1
  • 13
  • 2
    May be you unaware, but this is a beat-up subject. You shouldn't use interop of Excel app to read data with c#. You should use special oledb provider. I marked your question as duplicate. Here is the link where to look [Reading excel file using OLEDB Data Provider](https://stackoverflow.com/questions/18511576/reading-excel-file-using-oledb-data-provider) – T.S. Aug 09 '18 at 22:44
  • 2
    For only reading data the OLEDB solution is the way to go. - Your Excel file ist smaller than the CSV because Excel files are compressed, a 181 MB Excel file may well be over 1 GB uncompressed. That is too much to work efficiently in ClosedXML. Even Excel itself doesn't work so well at that file size ... – Raidri Aug 10 '18 at 09:00
  • For what it's worth, Excel is probably not the best solution for this. If you can, use a proper database eg SQLite. – Francois Botha Aug 10 '18 at 20:33
  • I will try to use OLEDB. I do find it a bit confusing to read specific "cell" locations with OLEDB... @FrancoisBotha Unfortunately, everyone's data gets generated into excel documents. So Excel or CSV are my only choices right now – Jummi Aug 10 '18 at 20:59

1 Answers1

1

Idea 1: There is some overhead with ExcelDataReader's AsDataSet - so it's a good idea to use the reader interface directly when working with large sheets. It implements the IDataReader interface and provides pr-row level access to the data:

using (var reader = ExcelReaderFactory.CreateReader(stream)) {
    reader.Read();
    for(int c = 0; c < reader.FieldCount; c += 3) {
        TagListData.Add(new TagClass { IsTagSelected = false, TagName = Convert.ToString(reader.GetValue(c)), rIndex = r, cIndex = c });
    }                    
}

Idea 2: Try to pass ExcelDataSetConfiguration.UseColumnDataType = false to AsDataSet, this eliminates an internal pass and reduces memory pressure, so should improve performance noticably with large sheets

user8728340
  • 646
  • 5
  • 7