0

Right now i am getting the data from excel and looping through the rows and based on condition i am processing the results like storing the result in an object for further processing.

The excel sheet is about 20 MB and the record count is closely to 7000 and i am using open xml to grab the data from excel file like as mentioned in below code.

        string filePath = @"C:\weather-Data\DesignConditions_p.xlsx";
        using FileStream fs = new FileStream(filePath, FileMode.Open, FileAccess.Read, FileShare.ReadWrite);
        using SpreadsheetDocument doc = SpreadsheetDocument.Open(fs, false);
        WorkbookPart workbookPart = doc.WorkbookPart;
        SharedStringTablePart sstpart = workbookPart.GetPartsOfType<SharedStringTablePart>().First();
        SharedStringTable sst = sstpart.SharedStringTable;

        Sheet firstSheet = workbookPart.Workbook.Descendants<Sheet>().First();
        Worksheet sheet = ((WorksheetPart)workbookPart.GetPartById(firstSheet.Id)).Worksheet;

        var rows = sheet.Descendants<Row>();
        var weatherDataList = new List<WeatherStation>();
        foreach (Row row in rows.Skip(5)) // it is taking almost more than 60 minutes to process and enter into the if loop below (country.Equals("USA"))
        {
            var weatherData = new WeatherStation();
            string country = GetCellValue(filePath, "Annual", $"B{row.RowIndex.ToString()}");
            if (country.Equals("USA"))
            {
                weatherData.CountryAbbreviation = country;
                weatherData.StateAbbreviation = GetCellValue(filePath, "Annual", $"C{row.RowIndex.ToString()}");
                weatherData.Number = GetCellValue(filePath, "Annual", $"E{row.RowIndex.ToString()}");
                ......
                .......
            }
        }

Could any one please point me in the right direction to optimize the processing time while reading the data from excel , I am using .Net Core for this application

Thanks in advance.

Glory Raj
  • 17,397
  • 27
  • 100
  • 203
  • If all you're interested in is the data (as opposed to the formatting), consider a direct data connection, such as ADO.NET via ADODB (on Windows) or via ODBC. Or perhaps something like https://github.com/ExcelDataReader/ExcelDataReader. – Zev Spitz Aug 13 '20 at 18:49
  • @ZevSpitz i need to get the cells based on particular row index – Glory Raj Aug 13 '20 at 19:06
  • You could construct your own index. You could either use the overload of `Enumerable.Select` which provides an index, or you could create your own variable to hold the index. – Zev Spitz Aug 13 '20 at 22:55

1 Answers1

1

You could use the 'SAX' approach, that way you're reading the file in parts so processing and IO could be faster.:

// The SAX approach.
    static void ReadExcelFileSAX(string fileName)
    {
        using (SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Open(fileName, false))
        {
            WorkbookPart workbookPart = spreadsheetDocument.WorkbookPart;
            WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();

            OpenXmlReader reader = OpenXmlReader.Create(worksheetPart);
            string text;
            while (reader.Read())
            {
                if (reader.ElementType == typeof(CellValue))
                {
                    text = reader.GetText();
                    Console.Write(text + " ");
                }
            }
            Console.WriteLine();
            Console.ReadKey();
        }
    }

https://learn.microsoft.com/en-us/office/open-xml/how-to-parse-and-read-a-large-spreadsheet

Other than that you could look for a library/nuget package that has some faster reading, because i see no impact-full ways to tune this code more.

sommmen
  • 6,570
  • 2
  • 30
  • 51
  • i need to get cell values based on row index and i am not sure how can i implement that at here with SAX approach – Glory Raj Aug 13 '20 at 18:57
  • @EnigmaState you can use `Read()` and `Skip()` to control the flow https://learn.microsoft.com/en-us/dotnet/api/documentformat.openxml.openxmlreader?view=openxml-2.8.1#methods but its also possible to read row by row. see: https://stackoverflow.com/questions/10555507/using-openxmlreader Is the data always 5 rows down? – sommmen Aug 14 '20 at 07:32