6

Background
I have a legacy site that allows authorised users to upload an Excel spreadsheet of product data etc. The site then reads the Excel worksheets and unpacks the data into SQL server. It's an old site and it uses OLE. Old but it works.

The problem
I've recently published the site to an Azure App Service but the existing section of my code that reads from Excel doesn't work (as Azure doesn't have the right driver).

The question
I'm happy to rewrite this section of code but what is the CORRECT or RECOMMENDED approach for reading from Excel using an Azure App Service? I'm not asking about ways that MIGHT work I'm only interested in the RIGHT way to do this.

By "recommendeded" I mean:

  • not unnecessarily complicated. Keep it simple.
  • likely to retain support from Microsoft in the future

I have researched this issue but have not been able to find a clear statement of the best way to do this. If you have experience or knowledge of different ways of doing this I'd be grateful if you could share your conclusion about the BEST way to do this.

Joey Cai
  • 18,968
  • 1
  • 20
  • 30
Mark Challice
  • 411
  • 1
  • 4
  • 5
  • Well, I can't tell if it meets your definition of recommended but we use [this package](https://github.com/tonyqus/npoi) in production, albeit in an azure function. So, using it in a web app should not give any trouble. – Peter Bons Nov 07 '18 at 13:29
  • `xlsx` is a zip package containing XML files. You could read them without any special support, not even using the OpenXML SDK. You can use the OpenXML SDK to read *really* large files, as shown in [How to: Parse and read a large spreadsheet document (Open XML SDK)](https://learn.microsoft.com/en-us/office/open-xml/how-to-parse-and-read-a-large-spreadsheet). Or you could use a library like EPPlus, probably the most popular library for creating and reading xlsx files – Panagiotis Kanavos Nov 08 '18 at 08:25
  • Where is SQL Server hosted? Is it Azure SQL or a VM? If it's a VM you could use SSIS to import Excel files from a folder into SQL Server. – Panagiotis Kanavos Nov 08 '18 at 08:26
  • Finallly, regarding "retain support from Microsoft". `xls` was abandoned 12 years ago and even services like Google Docs require a paid subscription to support it. `xlsx` was built specifically to allow anyone to read it without requiring OLEDB drivers. – Panagiotis Kanavos Nov 08 '18 at 08:31
  • Just found this too - [Import data from Excel to SQL Server or Azure SQL Database](https://learn.microsoft.com/en-us/sql/relational-databases/import-export/import-data-from-excel-to-sql?view=sql-server-2017). It shows various methods that can be used to load Excel data from the server's side. Azure Data Factory can also run SSIS packages – Panagiotis Kanavos Nov 08 '18 at 08:41
  • Another thing you'll have to consider is that once you are in the cloud, it's *cheaper and faster* to use the cloud - that means it's probably better to upload files directly to Blob storage instead of going through the service. Uploading a file to Blob storage can trigger other actions, eg a function that reads it and sends it to the database, or Azure Data Factory pipelines. The App service itself should only need to route the uploaded file stream to the blob storage. It *may* or *may not* be cheaper overall than putting all the logic in the App Service – Panagiotis Kanavos Nov 08 '18 at 08:45
  • Hi mark does the answer below work for you? – Ivan Glasenberg Nov 16 '18 at 08:37
  • Mark, any updates on this issue? :) – Ivan Glasenberg Nov 22 '18 at 11:49

2 Answers2

3

There should be many ways you can achieve this, and here I list 2 as below:

1.Use the DocumentFormat.OpenXml, which is published by MS, but it's a little complicated. The demo code is here.

2.Use ExcelDataReader, which is very simple and supports both .xls and .xlsx. You can refer to this article to do it(note that IsFirstRowAsColumnNames property is abandoned, you can see my code below for this change).

And I write a demo with the 2nd method ExcelDataReader.Just for test purpose, I uploaded the excel to the azure web app directory like below:

And the following is the excel content:

Step 1: Create an asp.net MVC project, and then install the latest version ExcelDataReader and ExcelDataReader.DataSet via nuget package manager.

Step 2: Create a ExcelData.cs file in your project which used to read excel file:

Step 3: Write the following code in ExcelData.cs:

using ExcelDataReader;
using System;
using System.Collections.Generic;
using System.Data;
using System.IO;
using System.Linq;

namespace WebApplication42
{
    public class ExcelData
    {
        string _path;
        public ExcelData(string path)
        {
            _path = path;
        }

        public IExcelDataReader GetExcelReader()
        {
            FileStream stream = File.Open(_path, FileMode.Open, FileAccess.Read);
            IExcelDataReader reader = null;
            try
            {
                if (_path.EndsWith(".xls"))
                {
                    reader = ExcelReaderFactory.CreateBinaryReader(stream);
                }
                if (_path.EndsWith(".xlsx"))
                {
                    reader = ExcelReaderFactory.CreateOpenXmlReader(stream);
                }

                return reader;
            }
            catch (Exception)
            {
                throw;
            }
        }

        //read the sheets name if you need
        public IEnumerable<string> GetWorksheetNames()
        {
            var reader = this.GetExcelReader();
            var workbook = reader.AsDataSet();
            var sheets = from DataTable sheet in workbook.Tables select sheet.TableName;
            return sheets;
        }

        //read data in a specified sheet
        public IEnumerable<DataRow> GetData(string sheet)
        {

            var reader = this.GetExcelReader();
            var workSheet = reader.AsDataSet(new ExcelDataSetConfiguration()
            {
                ConfigureDataTable = (_) => new ExcelDataTableConfiguration()
                {
                    //indicates if use the header values
                    UseHeaderRow = true
                }

            }).Tables[sheet];

            var rows = from DataRow a in workSheet.Rows select a;
            return rows;
        }    

    }
}

Step 4: In the controller, call the read excel method:

        public ActionResult Excels()
        {
            ViewBag.Message = "the data from excel:";
            string data = "";

            //your excel path after uploaded, here I hardcoded it for test only
            string path = @"D:\home\site\wwwroot\Files\ddd.xls";
            var excelData = new ExcelData(path);
            var people = excelData.GetData("sheet1");

            foreach (var p in people)
            {
                for (int i=0;i<=p.ItemArray.GetUpperBound(0);i++)
                {
                    data += p[i].ToString()+",";
                }

                data += ";";
            }

            ViewBag.Message += data;

            return View();
        }

Step 5: After publish to azure, launch the site and see the results-> all the data in excel are read:

QHarr
  • 83,427
  • 12
  • 54
  • 101
Ivan Glasenberg
  • 29,865
  • 2
  • 44
  • 60
0

So, I am using https://github.com/dotnetcore/NPOI for Excel Import and have tested on Azure App Service and it is really good. I have tested by importing 50,000 records successfully. But beware, if you want to import about a 100 thousand records, you may get request timeout error as for the long running tasks, one should rather create web jobs/functions. Keep this in mind Azure App Service has a requestTimeout limit of 230s. Considering the following links before you select your implementation will be good.

https://feedback.azure.com/forums/169385-web-apps/suggestions/19309957-allow-a-request-timeout-of-more-then-3-8-minutes

Azure ASP .net WebApp - 500 Error - The request timed out

Gaurav Madaan
  • 459
  • 5
  • 9