1

ORIGINAL QUERY: Currently i wanted to switch from 32 bit builds to 64 bit, but i have trouble with finiding any library to extract data from xls. I can work with others like (xlsx, xlsm) on ClosedXML, unfortunately xls is not supported.

so right now this connexction will not stand.

    string connectionString = "Provider=Microsoft.ACE.OLEDB.12.0;Data Source=" + path + ";Extended Properties='Excel 12.0;HDR=NO,IMEX=1'";
    string excelQuery = "SELECT * FROM [monthly$A:G]";

Installing 64 bit drivers for OLE is out of a question. So maybe someone has stumble upon it, or have any ideas.

I'd like to build data into datatable with headers similar to HDR=YES and possibility to switch it off like in excample above.

Additionaly, due to excel formats, dapper is out of the question :(

i Tried to find some questions but all I could find would be to install OLEDB Driver for 64 bits.

EDIT: I don't understand why someones is donevoting Panagiotis answer, as it's corrected, thats why i accepted it. He's suggestion to use ExcelDataReader proved to be most efficient as im using more formats of excel like .xlsx, .xlsm & .xlsb in application, additionaly there are some .csv files. This allowed me to build a module, that was sufficient.

He was additionaly right to point out that .xls is ancient, although unfortunately it's not obsolete as a lot of processes i corporation still rely on it. But it should be osbsolete, we are unfortunately stuck due to poorly planned processes not because .xls is useful. Especially those companies should look at the fact that some formats are not supported, or might lose this support as .xls is slowly getting that treatment.

ANOTHER EDIT: Unfortunately I just noticed, and the noob i am I didn't know that nugets could be paid. Aspose, as great it is unfotunately is one of those paid extensions. So i had to change the code and add another converter class. To do so I used & modified code used in another [thread ][1] to my needs.

LAST EDIT:

Ok, after update to ExcelDataReader I wasn't pleased of performance, and converting a file, didn't seemed right to me, as i noticed that @MarkPflug had updated [Sylvan.Data.Excel][2] library, and impressed by benchmarks i wrote code based on that. There were some expetions but after contacting and delivering excamples to Mark, he quickly fixed those. Right now it's fastest and best solution i can think of.

I will update it for my use to extract asynchronously from number of files selected. But based on that, anyone can use it.

**FINAL CODE:**
using SylExcel = Sylvan.Data.Excel;
using SylCSV = Sylvan.Data.Csv;
using SylData = Sylvan.Data;
        /// <summary>
        /// get data from .csv, .xls, .xlsx, .xlsm, .xlsb. Call is then redirected accordingly
        /// </summary>
        /// <param name="_filePath">path to file needed</param>
        /// <param name="_sheetName">sheet name to extract</param>
        /// <param name="_structure">structure of needed file</param>
        public void GetData(string _filePath, string? _sheetName, FilesStructures.Structure _structure)
        {
            //if (CheckIfOpened(_filePath))
            //{
            //    throw new OperationCanceledException(_filePath + " is opened, please close it.");
            //}

            try
            {
                string extension = Path.GetExtension(_filePath);

                switch (extension)
                {
                    case ".csv":
                        SylvanReaderCSV(_filePath, _structure);
                        break;
                    default:
                        SylvanReader(_filePath, _sheetName, _structure);
                        break;
                }

                //collect file information
                GetMetadata(_filePath);
            }
            catch (Exception ex)
            {
                throw new Exception(ex.Message);
            }

        }
        /// <summary>
        /// sylvan reader for .xls, .xlsx, .xlsm, .xlsb
        /// </summary>
        /// <param name="_filePath">path to file needed</param>
        /// <param name="_sheetName">sheet name to extract</param>
        /// <param name="_structure">structure of needed file</param>
        /// <exception cref="Exception"></exception>
        private void SylvanReader(string _filePath, string? _sheetName, FilesStructures.Structure _structure)
        {
            try
            {

                string headerSchema = "";

                //get unique column list to test for duplicates
                List<string> columnList = new List<string>();

                //build headers
                //_structure.HeaderList = List<strings> passed by object _structure
                foreach (var header in _structure.HeaderList)
                {
                    if (!columnList.Exists(column => column == header))
                    {
                        headerSchema += header + ", ";
                        columnList.Add(header);
                    }
                }
                //remove last 2 chars
                headerSchema = headerSchema.Substring(0, headerSchema.Length - 2);
       

                var options = new SylExcel.ExcelDataReaderOptions { Schema = SylExcel.ExcelSchema.NoHeaders, GetErrorAsNull = true };
                //collect data
                using (SylExcel.ExcelDataReader excelDataReader = SylExcel.ExcelDataReader.Create(_filePath, options))
                {
                    //loop to locate sheet
                    while (excelDataReader.WorksheetName != _sheetName)
                    {
                        excelDataReader.NextResult();

                        if (excelDataReader.WorksheetName == null)
                        {
                            throw new Exception("didnt find the sheet");
                        }
                    }
                    //loop to find headers
                    for(int i = 0; i < _structure.StartRow; i++)
                    {
                        excelDataReader.Read();
                    }

                    // parse the schema, and use it to reinitialize the schema for the sheet.
                    var schema = SylData.Schema.Parse(headerSchema);
                    excelDataReader.InitializeSchema(schema.GetColumnSchema(), useHeaders: true);

                    DataTable = new DataTable();
                    DataTable.Load(excelDataReader);
                }
            }
            catch(Exception ex)
            {
                throw new Exception(ex.Message);
            }
        }
        /// <summary>
        /// sylvan reader for .csv
        /// </summary>
        /// <param name="_filePath">path to file needed</param>
        /// <param name="_sheetName">sheet name to extract</param>
        /// <param name="_structure">structure of needed file</param>
        /// <exception cref="Exception"></exception>
        private void SylvanReaderCSV(string _filePath, FilesStructures.Structure _structure)
        {
            try
            {
                string headerSchema = "";

                //get unique column list to test for duplicates
                List<string> columnList = new List<string>();

                //build headers
                //_structure.HeaderList = List<strings> passed by object _structure
                foreach (var header in _structure.HeaderList)
                {
                    if (!columnList.Exists(column => column == header))
                    {
                        headerSchema += header + ", ";
                        columnList.Add(header);
                    }
                }

                //remove last 2 chars
                headerSchema = headerSchema.Substring(0, headerSchema.Length - 2);

                var schema = new SylCSV.CsvSchema(SylData.Schema.Parse(headerSchema));
                var options = new SylCSV.CsvDataReaderOptions //check if get error as null possible
                { 
                    Schema = schema,
                };
                using (SylCSV.CsvDataReader csvDataReader = SylCSV.CsvDataReader.Create(_filePath, options))
                {
                    DataTable = new DataTable();
                    DataTable.Load(csvDataReader);
                }
            }
            catch (Exception ex)
            {
                throw new Exception(ex.Message);
            }
        } 


  [1]: https://stackoverflow.com/questions/28398071/reading-excel-xlsb-files-in-c-sharp
  [2]: https://github.com/MarkPflug/Sylvan.Data.Excel
  • *Don't* use XLS to begin with. That format was replaced by `xlsx` in 2007 - that's *15* years ago. *All* Excel versions use `xlsx` and so do all applications. There are no compatibility concerns - in fact, it's the ancient `xls` that causes compatibility problems. Google Sheets for example doesn't support `xls` files unless you pay – Panagiotis Kanavos Jan 13 '22 at 17:32
  • "Installing 64 bit drivers for OLE is out of a question" why? It helps to know the constraints before suggesting an alternate solution. – D Stanley Jan 13 '22 at 17:42
  • Hey, I both cannot stop using xls ans install 64 bit drivers, its due to the fact that im working on existing infrastructure and this is out of my hand, if it would be up to me we would not use xls in the first place. –  Jan 13 '22 at 19:16

2 Answers2

0

You can check out my library Sylvan.Data.Excel. It has no external dependencies, is cross plat, open-source, MIT licensed, and also the fastest Excel data reader for .NET. It supports both .xls, .xlsx, and .xlsb all through the same API. It is not as fully featured as some other libraries, it only supports reading (not writing), but it should be a pretty direct replacement for ACE, because it implements DbDataReader. If you run into any issues/questions feel free to open them in the GitHub repo.

Using it is pretty straightforward:


var edr = ExcelDataReader.Create("data.xls");

while(edr.Read()) {
  for(int i = 0; i < edr.FieldCount; i++) {
    var value = edr.GetString(i);
  }
}

By default, it will expect the first row of each sheet to contain headers. This can be disabled however. When disabled, columns can only be accessed by ordinal.

var opts = new ExcelDataReaderOptions { Schema = ExcelSchema.NoHeaders };
var edr = ExcelDataReader.Create("data.xls", opts);

Since it implements DbDataReader, loading it into a data table is a one-liner:

var dt = new DataTable();
dt.Load(edr);
MarkPflug
  • 28,292
  • 8
  • 46
  • 54
  • I'll try it tomorrow and get back to you if its working :) –  Jan 13 '22 at 19:22
  • Hello Mark, I just finished it with ExcelDataReader after multiple changes ive gone thru, although looking at your benchamrks i was thinking of swithing it to Sylvan, but i have to ask: I saw on GitHub that you inserted `.xlsb` Benchmarks, did you added this functionality lately and now it supports `.xlsb` without dependencies or does it need `OLE ` drivers? –  Jan 18 '22 at 16:20
  • Yes, I added .xlsb support over the weekend. Still has no external dependencies. If you run into issues feel free to reach out through the github repo. Just make sure you are using the latest pre-release version. I'll push an official build with .xlsb soon. – MarkPflug Jan 18 '22 at 16:40
  • Great news, ill try to build new modules with Sylvan on Friday and test the results. But it would be great not to format file from `xlsb` to `xlsx` do you plan any `csv` support later on? –  Jan 18 '22 at 17:08
  • 1
    I have a separate library Sylvan.Data.Csv that provides CSV support. It is also the fastest CSV parser for .NET. https://www.joelverhagen.com/blog/2020/12/fastest-net-csv-parsers – MarkPflug Jan 18 '22 at 17:19
  • is it possible to select only one worksheet to datatable? –  Jan 18 '22 at 18:47
  • Yes. DataTable.Load will load the *current* sheet. So, if the sheet you want to load isn't the first sheet, you'll need to call NextResult() until the WorksheetName is the one you're looking for. – MarkPflug Jan 18 '22 at 18:52
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/241181/discussion-between-krzysiek-mastalerz-and-markpflug). –  Jan 18 '22 at 19:03
0

First, don't use XLS to begin with. That format was replaced by xlsx in 2007 - that's 15 years ago. All Excel versions use xlsx and so do all applications. There are no compatibility concerns - in fact, it's the ancient xls that causes compatibility problems. Google Sheets for example doesn't support xls files unless you pay. The only reason xls persists is inertia.

If you insist on using xls (why?) you can use ExcelDataReaader. That library can read both xls and xlsx files and return either an DbDataReader or a DataSet. The NuGet package has 17M downloads making it the 2nd most popular Excel library after EPPlus.

The example in the repository's pages shows how to use it in the simplest case - ExcelReaderFactory.CreateReader(stream) returns a DbDataReader :

using (var stream = File.Open(filePath, FileMode.Open, FileAccess.Read))
{
    // Auto-detect format, supports:
    //  - Binary Excel files (2.0-2003 format; *.xls)
    //  - OpenXml Excel files (2007 format; *.xlsx, *.xlsb)
    using (var reader = ExcelReaderFactory.CreateReader(stream))
    {
        // Choose one of either 1 or 2:

        // 1. Use the reader methods
        do
        {
            while (reader.Read())
            {
                // reader.GetDouble(0);
            }
        } while (reader.NextResult());

        // 2. Use the AsDataSet extension method
        var result = reader.AsDataSet();

        // The result of each spreadsheet is in result.Tables
    }
}
Panagiotis Kanavos
  • 120,703
  • 13
  • 188
  • 236
  • @MarkPflug yes, he would. Because `xls` is obsolete and not required by *any* application, while it *does* cause compatibility and driver problems. So using `xls` is a really bad idea. – Panagiotis Kanavos Jan 13 '22 at 17:43
  • "xls is obsolete". That's simply not true. Excel will still open and save that format, so it is not obsolete, and it never will be. – MarkPflug Jan 13 '22 at 17:44
  • 1
    Hello, xls is not my idea, unfortunately this is what im stuck with, if it would up to me i would sourca data diretly by procedure or API, but im gettin a push back there. The format is saved by a party that would not change it as there is plenty of old macros written to use same file and have poorly written code with path embeddet onto it, even when i asked about a list so i could fox them no one was able to help. Unfortunately in old and big corporations nothing really dies, such as xls format... –  Jan 13 '22 at 19:21
  • I will also test ExcelDataReaader in addition to one from Mark reply, and test the speeds on both solutions speeds and stability, thanks guys :) –  Jan 13 '22 at 19:24
  • @Panagiotis Kanavos, did you had any luck in using ExcelDataReader in net5.0? I added: `using ExcelDataReader;` and `var readData= new ExcelReaderFactory.CreateBinaryReader(stream)` The problem is that i get info that `CreateBinaryReader` doesn't exists in type `ExcelReaderFactory`, but when i peek definition for `ExcelReaderFactory` its clearly there. –  Jan 14 '22 at 09:14
  • Yes, and .NET 6. You're probably missing a `using` clause. I use it to load tons of very unfortunately formatted Excels with invoices from suppliers that can't even be bothered to preserve the same format from one month to the next. One especially evil Russian supplier won't just include multiple sections in the same file - they'll even mix up Latin and Cyrillic in the Form-Of-Payment field. The difference between English and Cyrillic `CC` is only a couple of pixels. – Panagiotis Kanavos Jan 14 '22 at 09:20
  • You were right, once i used lines from their github `using (var stream = File.Open(_filePath, FileMode.Open, FileAccess.Read))` & then inside this: `using (var reader = ExcelReaderFactory.CreateReader(stream))` it seems to work, although i do not understand why. –  Jan 14 '22 at 09:26