0

Im trying to create a backup of approx. 1400 repositories. These repos are all cloned to a VM. Now i need to compress these into 1 zip, but this takes a long time. currently I've set the cap on 600min which my current implementation reaches before being 2/10th done.

Using powershell 7.2.2 and tried with both Compress-Archive and the 7Zip4Powershell module. And tried splitting the repos up in smaller batches since compressing all the repos at once maxed out the RAM on the VM. by using batches I could run garbage collection between each batch so RAM usage wasn't a problem. although appending files to an exisiting Zip seems to unzip the original first to add new files afaik (this just makes it take longer probably).

So does anyone have any tips on an implementation that would take less than 600min? Does not have to adhere to a powershell solution.

Its about ~2 million files and ~150gb size.

Compress-7Zip -Path "C:\repos\batchX" -ArchiveFileName $zipfile -Append -Format Zip -CompressionLevel Fast -CompressionMethod Lzma2
Compress-Archive -Path "C:\repos\batchX\*" -DestinationPath $zipfile -CompressionLevel "Optimal" -Update
mklement0
  • 382,024
  • 64
  • 607
  • 775
Avixon
  • 77
  • 1
  • 2
  • 9
  • Have you checked how much space you save with compression on a subsection of files? maybe you could skip compression and just archive them, see how fast is. – OrigamiEye Jun 23 '22 at 13:28
  • use `[IO.Compression.ZipFile]::CreateFromDirectory($sourceDirectoryName, $destinationArchiveFileName)` aside from that, 7zip is probably faster. If you're okay with compressing the files in chunks [this function](https://stackoverflow.com/a/72611161/15339544) can help, faster than `Compress-Archive` for sure. – Santiago Squarzon Jun 23 '22 at 13:29
  • `7Zip4Powershell` seems good, as it use `7z.dll` to do its deed. Don't do multiple batch. Do it all at once. multiple zipping operations at the same time will concurrently fight for the same resources and usually, that is not a 1:1 ratio. 4 things being zipped at the same time do not result in a 25% resources for each as more concurrent access do slow down more the hard drive / CPU. Furthermore, 7z (used in the module) usually use all the CPU available to zip. So doing 1 zip at a time is best. You are better zipping all at once. – Sage Pourpre Jun 23 '22 at 13:32
  • I don't know the context of your environment and the frequency of the backups, but if you have the possibility to easily Scale up / Scale down your VM, you could scale it up to do your backup. More CPU / faster disks will result in a faster compression. In cloud infrastructure such as Azure / AWS , you would only pay more for the time it was scaled up so this can be a viable strategy. I understand it might also not be possible is resources are limited or if you can't scale at will. – Sage Pourpre Jun 23 '22 at 13:34
  • Try to explicitly enable [multithreading](https://github.com/thoemmi/7Zip4Powershell#customization), e. g. `$compressor.CustomParameters.Add("mt", (Get-CimInstance Win32_ComputerSystem).NumberOfLogicalProcessors)`. You may reduce the number of threads, if it uses up too much RAM. – zett42 Jun 23 '22 at 13:57
  • Also, on the hardware side, these VMs should run on NVMe SSDs with high random access speed, to optimize for a high number of small files. If you had mostly big files, then sequential read speed would be more important. – zett42 Jun 23 '22 at 15:49

1 Answers1

1

Try fastest instead of optimal for Compress-Archive. Make sure that you can extract the files. There is a note in the documentation about a 2 GB limit.

For Compress-7Zip, try Deflate instead of Lzma2.

Mark Adler
  • 101,978
  • 13
  • 118
  • 158