1

I have a 7z archive containing hundreds of text files. I want to load each text file directly into memory.

This code works, but it is slow:

var memoryStreams = new List<MemoryStream>();
var st = new SevenZipExtractor("files.7z");
for (var i = 0; i < st.FilesCount; i++)
{
    var stream = new MemoryStream();
    st.ExtractFile(i, stream);
    memoryStreams.Add(stream);
}
//Read the memoryStreams - at this point it runs very fast

This takes 15 minutes to run. In comparison it takes ten seconds to unzip to disk, so as a workaround I'm doing exactly that, and then deleting the folder after reading each file into memory. But there must be a better way.

JJ Lison
  • 11
  • 1
  • So if you use the same code to write to file stream instead of memory it completes in 10 seconds? – Guru Stron Apr 22 '23 at 12:16
  • Not the same code - I am using st.ExtractArchiveAsync to write the files to disk. Which completes in seconds. – JJ Lison Apr 22 '23 at 14:03
  • "This code works, but it is slow" with at the end of the code the message "....at this point it runs very fast".. I must be getting old, but I am lost.... – Luuk Apr 22 '23 at 14:08
  • BTW: What is the total size of the "hundreds of text files" (Mb's or Gb's ?) – Luuk Apr 22 '23 at 14:18
  • It takes a good 15 minutes to execute the code above the comment. The unshown code below the comment, which iterates through memoryStreams, converts to strings and parses into structured data, executes very quickly - not even a second. Total size of the uncompressed text files is about 300Mb. – JJ Lison Apr 22 '23 at 14:56
  • Would this link help? https://stackoverflow.com/questions/28256400/how-to-basically-extract-file-with-sevenzipsharp – SoftwareDveloper Apr 22 '23 at 19:38
  • Unfortunately not. That example essentially does the same thing as my comment code, calling ExtractFile on every file one by one, which is unacceptably slow. – JJ Lison Apr 23 '23 at 12:15
  • @JJLison: No, that code also does do `SetLibraryPath()`, which could make a differenct. – Luuk Apr 23 '23 at 13:15
  • I tried it. Using SetLibraryPath() did not make any difference - if anything it was a bit slower (it took 16 minutes 32 seconds this time). I expect that calling ExtractFile multiple times is just not efficient (thousands of disk seeks to the same file, redundant data reading, etc) and I need to extract the whole archive at once, but to a memory stream instead of an output folder. – JJ Lison Apr 24 '23 at 12:41

0 Answers0