3

I wanted to load .xls file(Type: 97-2003 spreadsheet) using C#. I am using Gembox library.

When I used below command, I encountered "file contains corrupted data." as error.

ExcelFile ef = ExcelFile.Load(filepath, XlsxLoadOptions.XlsxDefault);

When I removed XlsxLoadOptions parameter than I am getting "Reading error: file is not a valid OLE2 Compound File."

I am new to C# and unable to debug the root-cause of the issue. Please help!

thmshd
  • 5,729
  • 3
  • 39
  • 67
RISHI KHANNA
  • 354
  • 5
  • 23
  • Please post the relevant piece of code you have written so that people can help – Souvik Ghosh Nov 21 '17 at 06:20
  • ExcelFile ef = ExcelFile.Load("C://temp//book.xls"); Code is failing at the beginning itself while loading the .xls file. – RISHI KHANNA Nov 21 '17 at 06:23
  • 1
    Does this particular file work when uploaded to their example page? You can test your File here: https://www.gemboxsoftware.com/spreadsheet/examples/c-sharp-read-excel/401 – thmshd Nov 21 '17 at 06:27
  • @thmshd Yes it did worked there. I do not see any error while running the example: – RISHI KHANNA Nov 21 '17 at 06:34
  • SpreadsheetInfo.SetLicense("FREE-LIMITED-KEY"); ExcelFile ef = ExcelFile.Load("Cornerstone_Session_Signin-11_21_2017.xls"); – RISHI KHANNA Nov 21 '17 at 06:34
  • As vijayvicks mentioned, does your XLS file have workbook protection? Note that workbook protected and/or encrypted XLS files are not supported by GemBox.Spreadsheet. Also just as an FYI the workbook protected and/or encrypted XLSX files are supported. – Mario Z Nov 21 '17 at 06:39
  • @RISHIKHANNA very strange if sample works on the remote Page. I'd assume, when you copy-paste the entire example and run locally, it should also run. – thmshd Nov 21 '17 at 06:43
  • I have not set any protection level on the file. Although when I tried to open the file manually i am getting this "The file format and extension does not match. File could be corrupted or unsafe" – RISHI KHANNA Nov 21 '17 at 06:54
  • 1
    @RISHIKHANNA I see now, so your XLS file is of some other format. Can you try the following: "ExcelFile ef = ExcelFile.Load(filepath, LoadOptions.HtmlDefault);" or perhaps the following: "ExcelFile ef = ExcelFile.Load(filepath, LoadOptions.CsvDefault);". If those do not work then can you upload somewhere your file so that I can take a look at it? – Mario Z Nov 21 '17 at 06:56
  • @MarioZ: Thank you very much.. By changing it to HtmlDefault it did work..But just for my understanding what happens here when we change LoadOptions? – RISHI KHANNA Nov 21 '17 at 07:06
  • 1
    @RISHIKHANNA with LoadOptions you specify how to read the input file. In this case you had a file that has ".xls" extension, but it is not of XLS format but rather it is if a HTML format. To read that file correctly, you need to treat it as any other HTML file, in other words you need to specify that GemBox.Spreadsheet must read it as a HTML file. – Mario Z Nov 21 '17 at 07:17
  • Yep..got it now..thanks for the info. – RISHI KHANNA Nov 21 '17 at 07:18

1 Answers1

4

UPDATE (2020-03-28)

In the newer versions of GemBox.Spreadsheet, the ExcelFile.Load(String) will check the file's signature in case of a ".xls" file.
In other words, there is no more need for that GetLoadOptions method from below.

Also, there is a new overload method, ExcelFile.Load(Stream).
This one will always check the file's signature in the provided stream.

ORIGINAL

This question was answered in the comments. Unfortunately, it is not well visible there, so here is the answer and also some additional details about it.

GemBox.Spreadsheet provides few Load overload methods. When using the following:

ExcelFile ef = ExcelFile.Load("C://temp//book.xls");

It will result in following:

ExcelFile ef = ExcelFile.Load("C://temp//book.xls", LoadOptions.XlsDefault);

Which is the same as the following:

ExcelFile ef = ExcelFile.Load("C://temp//book.xls", new XlsLoadOptions());

The load options specify how the input file will be read and when using the ExcelFile.Load(String) method, the options will be based on the file's extension.

In this case, the file has ".xls" extension, however, it was not of a binary XLS format (BIFF8), but rather it was of an HTML format. This is a trick that is somewhat commonly used, you can have HTML, CSV, even XLSX files with a ".xls" extension and MS Excel will be able to open it. It will detect the right file's format and it will prompt the user with something like the following message:

File format and extension of 'book.xls' don't match. The file could be corrupted or unsafe. Unless you trust its source, don't open it. Do you want to open it anyway?

Note that this trick only works with ".xls" extension, it does not work with for example ".xlsx". Nevertheless, we could use something like the following to detect the right file format:

private static LoadOptions GetLoadOptions(string path)
{
    string extension = Path.GetExtension(path).ToUpperInvariant();
    switch (extension)
    {
        case ".XLSX":
        case ".XLSM":
        case ".XLTX":
        case ".XLTM":
            return LoadOptions.XlsxDefault;
        case ".XLS":
        case ".XLT":
            return GetLoadOptions(path, null);
        case ".ODS":
        case ".OTS":
            return LoadOptions.OdsDefault;
        case ".TAB":
        case ".TSV":
            return new CsvLoadOptions(CsvType.TabDelimited);
        case ".CSV":
            return LoadOptions.CsvDefault;
        default:
            return null;
    }
}

private static LoadOptions GetLoadOptions(string xlsPath, LoadOptions defaultOptions)
{
    byte[] signature = new byte[8];
    using (var stream = File.OpenRead(xlsPath))
        stream.Read(signature, 0, 8);

    byte[] xlsSignature = new byte[] { 0xD0, 0xCF, 0x11, 0xE0, 0xA1, 0xB1, 0x1A, 0xE1 };
    if (signature.SequenceEqual(xlsSignature))
        return LoadOptions.XlsDefault;

    byte[] xlsxSignature = new byte[] { 0x50, 0x4B, 0x03, 0x04 };
    if (signature.Take(4).SequenceEqual(xlsxSignature))
        return LoadOptions.XlsxDefault;

    string firstLine = File.ReadLines(xlsPath)
        .First(line => !string.IsNullOrWhiteSpace(line)).TrimStart().ToUpperInvariant();
    if (firstLine.StartsWith("<!DOCTYPE") ||
        firstLine.StartsWith("<HTML") ||
        firstLine.StartsWith("<BODY"))
        return LoadOptions.HtmlDefault;

    return defaultOptions;
}

Also here is a small demonstration example on how to use it:

string filepath = "C://temp//book.xls";
LoadOptions options = GetLoadOptions(filepath);

if (options == null)
    throw new FileFormatException();

ExcelFile ef = ExcelFile.Load(filepath, options);
// ...
Mario Z
  • 4,328
  • 2
  • 24
  • 38