0

As input, I have a set of excel files with several worksheets inside. I need to export a single csv file for each worksheet. Below is my code which works but it is very slow. It builds upon the solutions proposed in this previous post. Please consider that I have to run this on rather big .xlsx files (approx. 300Mb).

QUESTION: Is there any way to improve this?

void Main()
{
        string folder = @"\\PATH_TO_FOLDER\";

        var files = Directory.GetFiles(folder, "*.xlsx", SearchOption.TopDirectoryOnly);

        foreach (string file in files)
        {

            ConvertToCsv(file,  Directory.GetParent(file) + @"\\output\");
        }


}

public static void ConvertToCsv(string file, string targetFolder)
{

    FileInfo finfo = new FileInfo(file);
    ExcelPackage package = new ExcelPackage(finfo);

    // if targetFolder doesn't exist, create it
    if (!Directory.Exists(targetFolder)) {
        Directory.CreateDirectory(targetFolder);
    }


    var worksheets = package.Workbook.Worksheets;
    int sheetcount = 0;

    foreach (ExcelWorksheet worksheet in worksheets)
    {
        sheetcount++;
        var maxColumnNumber = worksheet.Dimension.End.Column;
        var currentRow = new List<string>(maxColumnNumber);
        var totalRowCount = worksheet.Dimension.End.Row+1;
        var currentRowNum = 1;


        //No need for a memory buffer, writing directly to a file
        //var memory = new MemoryStream();

        string file_name = targetFolder +  Path.GetFileNameWithoutExtension(file) +  "_" + sheetcount + ".csv";

        using (var writer = new StreamWriter(file_name, false, Encoding.UTF8))
        {
            //the rest of the code remains the same


            for (int i = 1; i < totalRowCount; i++)
            {
                i.Dump();
                // populate line with semi columns separators
                string line = "";
                for (int j = 1; j < worksheet.Dimension.End.Column+1; j++)
                {
                    if (worksheet.Cells[i, j].Value != null)
                    {
                        string cell = worksheet.Cells[i, j].Value.ToString() + ";";
                        line += cell;
                    }
                }
                // write line
                writer.WriteLine(line);
            }
        }
    }
}
Nic
  • 1,262
  • 2
  • 22
  • 42
  • [WorkBook.SaveAs() can generate you a CSV](https://stackoverflow.com/questions/5034717/save-an-excel-file-to-a-csv-file-in-c-sharp-code) – Alex K. Aug 17 '17 at 12:43
  • @AlexK. WorkBook.SaveAs() works on a whole workbook. I need to create a csv file for each worksheet. – Nic Aug 17 '17 at 12:54
  • 1
    Sheet.Copy creates a standalone copy that you would save then close. – Alex K. Aug 17 '17 at 13:01
  • I can't see the ExcelWorksheet.Copy() method. I tried to create a new ExcelPackage copying a single worksheet inside. I'm testing the solution to compare the running time – Nic Aug 17 '17 at 13:10
  • Creating a new ExcelPackage for each worksheet and then using the WorkBook.SaveAs() method is slightly faster (7min vs 9min for a 60Kb excel file). – Nic Aug 17 '17 at 13:43
  • If I'm reading between the lines on your code, it looks like you're reading an excel file on the network, and then writing a stream back to the network as you're working? Some apps like Adobe and Office will often have performance issues doing that - have you tried copying the .xlsx locally, working from there, and then copying the final file back to the network after it's completed? – Kevin Aug 17 '17 at 13:49
  • @Kevin you are right. I'm doing a quick check on that. I'll keep you posted – Nic Aug 17 '17 at 13:52
  • Other options are using OLEDB instead of automation: https://stackoverflow.com/questions/6789648/query-excel-sheet-in-c-sharp. If the files are always XLSX then they are XML in Zip files so you could parse that way manually or use the specialised Open XML library https://msdn.microsoft.com/en-us/library/office/gg575571.aspx – Alex K. Aug 17 '17 at 13:59

0 Answers0