2

I have a basic winforms app that a user can upload an excel file (.xlsx) and I want to read the contents of this file so I am using EPPlus.

The problem is, I am trying to load the contents of a very large excel file, it has 7 tabs, with one tab haveing more then 200k rows, and another tab with 70k. The other 5 total to about 50k.

These file will also only continue to get bigger.(The end goal) Since I want to import the data, read the data, and depending on the rules/data I have, I need to write data back on the excel file and export it.

I am having issues though with EPPlus in achieving the loading of the data.

here is code

var file = new FileInfo(filePath);
using (var package = new ExcelPackage(file))
{
    try
    {
        // Get the work book in the file
        ExcelWorkbook workBook = package.Workbook; //Hangs here for about 2 mins
        if (workBook != null)
        {
            if (workBook.Worksheets.Count > 0)
            {
                // Get the first worksheet
                ExcelWorksheet currentWorksheet = workBook.Worksheets.First();
                // gets the currentWorksheet but doesn't evaluate anything...

            }
        }
    }
    catch (Exception ex)
    {
        throw ex;
    }
}

The code hangs on the package.workbook line for roughly 2 minutes. Then it gets down into the if where it gets the currentWorkSheet, and if I look at the contents of that variable in my watch, nothing is really loaded since it displays:

Function evaluation disabled because a previous function evaluation timed out. You must continue execution to reenable function evaluation.

I looked at this link from EPPlus and it shows the only issue with loading large files is load from top to bottom and left to right, and they are saying more than 5,000. I have way more than though so I am just wondering if this could even be possible with EPPlus?

Additionally, I have did some google searching and most issues are they can't open the large excel files on server where they could locally...

I have also started looking at Open XML SDK and it seems to be better performance wise, but also much more difficult to use code wise.

techvice
  • 1,315
  • 1
  • 12
  • 24
Robin
  • 165
  • 1
  • 3
  • 14
  • 1
    This is not my work, using excel. I am just picking up a project where it was left off. I never have used excel as a database. But, in regards to my answer, using EPPlus does work for what I am intending on using it for. I was just accessing the cells in an inappropriate way. Also the meaning I had against Open XML is (from my standpoint) it looked a little harder to code in compared to EPPlus. Then again, I am not very familiar with it either so I don't know. Would I have needed to go that route, I would have learned. – Robin Nov 21 '14 at 18:04
  • 1
    If you have any control of how the files are generated, and the workbooks are essentially flat files, i.e. no functions/macros/formatting involved, just a data dump, you'd be better off with several separate .csv files instead of an excel file with separate workbooks. Failing that, @DJKRAZE 's suggestion of hammering out an OpenXML solution is the way to go about it. – Dan S. Nov 21 '14 at 18:06

1 Answers1

3

Those sounds like pretty big datasets so you might want to read this:

EPPlus Large Dataset Issue with Out of Memory Exception

Basically, you can run out of RAM with "larger" dataset. But its not just the number of rows that adds to the size but also the columns AND the content of each cell. Strings will generally take up a lot more room than numbers so it is sometime difficult to predict when EPP will start to have memory issues. Rumor has it the newest version of EPP is better but I havent tested it myself.

Seems like you got it working based on your comments and thats good but keep the memory limitations in mind. I aggree with you - doing it in Open XML would not be a simple exercise.

Community
  • 1
  • 1
Ernie S
  • 13,902
  • 4
  • 52
  • 79
  • Thanks for this, fortunately I did find out the issue and was able to resolve it but I will keep this in mind. Thanks! – Robin Nov 21 '14 at 18:37
  • Is it possible to get around the memory limitation by loading and processing a file in chunks, using something similar to Linq's `Skip(x)` and `Take(x)` methods, or does it load the entire workbook into memory regardless? – Dan S. Nov 21 '14 at 18:37
  • @DanS. I tried that (might have mentioned it in my answer in that linked post). No luck - once you open the file in EPP it all gets loaded no matter what. Like I said, maybe they figured it out in the latest version which is still in beta - be nice if they did. – Ernie S Nov 21 '14 at 19:12
  • @Ernie You did mention it. My reading comprehension tanks on Fridays. – Dan S. Nov 21 '14 at 19:58
  • @Ernie - I got around the memory issue by reading loading the data in chunks. I had 90K+ records and loaded 10K at a time with no out of memory. – xaisoft Aug 24 '17 at 12:56