0

Is it possible to temporarily expand the contents of a .ZIP file (7-Zip) to a variable in memory, manipulate the contents and discard it, using PowerShell?

I'm currently expanding the archive which extracts a "log.dat" file. Then I read the contents of this log file, do the analysis and erase the "log.dat" file. But I have to do it 500,000 times which can be harmful to the drive. So right now my workaround for this was to create an R:\ RamDrive and use it like this

$zipFiles = Get-ChildItem -Filter '*.zip' -r
foreach($zip in $zipFiles) {

    Expand-7Zip -ArchiveFileName $zip.FullName -TargetPath 'R:\'

    Select-String -Path 'R:\log.dat' -Pattern "dataToSearchFor"  | ForEach-Object {
        # do analysis
    }

    Remove-Item 'R:\log.dat'

}

What I need is something like

$zipFiles = Get-ChildItem -Filter '*.zip' -r
foreach($zip in $zipFiles) {

    $extractedFiles = Expand-7Zip -ArchiveFileName $zip.FullName

    $logFile = $extractedFiles[0] # log.dat file is unique in file
    Select-String $logFile -Pattern "dataToSearchFor"  | ForEach-Object {
        # do analysis
    }
}

BTW: I have to use the 7-zip library for PowerShell because of the compression method used for the archives

[Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12
Install-PackageProvider -Name NuGet -MinimumVersion 2.8.5.201 -Force
Set-PSRepository -Name 'PSGallery' -SourceLocation "https://www.powershellgallery.com/api/v2" -InstallationPolicy Trusted
Install-Module -Name 7Zip4PowerShell -Force
Jason Aller
  • 3,541
  • 28
  • 38
  • 38
Bruno
  • 543
  • 2
  • 7
  • 21
  • 1
    Use a memory stream. c# code can be converted to powershell. See : https://stackoverflow.com/questions/17232414/creating-a-zip-archive-in-memory-using-system-io-compression – jdweng Nov 05 '22 at 23:06
  • I know how to do it in explorer. – js2010 Nov 06 '22 at 15:29

1 Answers1

3

They say "The third time is a charm."
Well, this my 3rd attempt at solving this. Info for second attempt is still valid, but only for certain zip files, so you can find that info farther down in this answer.

First, install the latest version of 7-zip from https://www.7-zip.org/.
In my case, installed 7z2201-x64.exe.

Second, Download Nuget package for SevenZipSharp, then, using 7-Zip to open the package, navigate to sevenzipsharp.net45.1.0.19.nupkg\lib\net45\ and save SevenZipSharp.dll to same location as your PowerShell script.

Either of the following seems to work for the download: https://www.nuget.org/api/v2/package/SevenZipSharp.Net45/1.0.19
Or
https://globalcdn.nuget.org/packages/sevenzipsharp.net45.1.0.19.nupkg

Third, take note of where 7-Zip's 7z.dll file is installed. In my case, it was C:\Program Files\7-Zip\7z.dll.

Forth, add the following lines to the top of your PowerShell script, making sure the path given to SetLibraryPath is set to that of 7-Zip's 7z.dll found in the Third step from above.

using namespace System.IO
Add-Type -Path "$PSScriptRoot\SevenZipSharp.dll"

[SevenZip.SevenZipExtractor]::SetLibraryPath('C:\Program Files\7-Zip\7z.dll')

Fifth, add the code you want to run.

This example reads all the file path names found in the archive file SevenZipTest.zip which is found in the same path as the PowerShell script:

function ReadFilenamesIn7Zip {
    param (
        [Parameter(Mandatory = $true, Position = 0)]
        [string]$Path
    )
    [SevenZip.SevenZipExtractor]$ZipArchive = [SevenZip.SevenZipExtractor]::new($Path)
    foreach($ArchiveFileInfo in $ZipArchive.ArchiveFileData) {
        $ArchiveFileInfo.FileName
    }
    $ZipArchive.Dispose()
}

ReadFilenamesIn7Zip "$PSScriptRoot\SevenZipTest.zip"

This example reads all the file lines from the first internal file named Test.TXT that is found in the archive file SevenZipTest.zip which is found in the same path as the PowerShell script:

function ReadFileIn7Zip {
    param (
        [Parameter(Mandatory = $true, Position = 0)]
        [string]$Path,
        [Parameter(Mandatory = $true, Position = 1)]
        [string]$FileToUnzip,
        [Parameter(Mandatory = $false, Position = 2)]
        [int]$FileIndex = -1
    )
    [SevenZip.SevenZipExtractor]$ZipArchive = [SevenZip.SevenZipExtractor]::new($Path)
    $ThisFileIndex = 0
    foreach($ArchiveFileInfo in $ZipArchive.ArchiveFileData) {
        $FileNameNoPath = Split-Path $ArchiveFileInfo.FileName -leaf
        if($FileNameNoPath -eq $FileToUnzip) {
            if($FileIndex -lt 0 -or $FileIndex -eq $ThisFileIndex) {
                $MemoryStream = [System.IO.MemoryStream]::new()
                $ZipArchive.ExtractFile($ArchiveFileInfo.Index, $MemoryStream)
                [StreamReader]$ZipReader = [StreamReader]::new($MemoryStream)
                $MemoryStream.Position = 0
                while ($null -ne ($line = $ZipReader.ReadLine())) {
                    $line
                }
                $ZipReader.Dispose()
                # $MemoryStream.Dispose() # Not needed: https://learn.microsoft.com/en-us/dotnet/api/system.io.memorystream?view=net-6.0#remarks
            }
            $ThisFileIndex++
        }
    }
    $ZipArchive.Dispose()
}

ReadFileIn7Zip "$PSScriptRoot\SevenZipTest.zip" "Test.TXT" 0

The functionality of ReadFilenamesIn7Zip and ReadFileIn7Zip is essentially the same as the ReadFilenamesInZip and ReadFileInZip examples below. For example, if you look at the functionality of the ReadFileInZip function below, when calling it without the -FileIndex parameter, it will return all text from all files matching the -FileToUnzip parameter, which is also true for ReadFileIn7Zip.

NOTE: {Info from second attempt is below this point.}

Info below appears to be valid only for zip files compressed with * Deflate, BZip2, and LZMA

This example takes the zip file 01_SQLite.zip and searches for any file by the name App.config. This is strongly similar to a reading version, and PowerShell equivalent, of the link jdweng provided in the comments, but several modifications such as storing the file in a StringBuilder.

UPDATE: The code was working in VSCode, but discovered it wasn't working in PowerShell 5.1 Terminal. Both should be the same, but for some reason they are not - and VSCode is set to reload PowerShell prior to each run of a script, so there shouldn't be any assemblies pre-loaded.

SOLUTION: Thank you Santiago, Added Add-Type -Assembly System.IO.Compression, System.IO.Compression.FileSystem to the code. Verified this worked by closing PowerShell terminal, re-open it, and running script:

using namespace System.IO
using namespace System.IO.Compression
using namespace System.IO.MemoryStream
using namespace System.Text
Add-Type -Assembly System.IO.Compression, System.IO.Compression.FileSystem

$ZipFilePath = "$PSScriptRoot\01_SQLite.zip"

[ZipArchive]$ZipArchive = [ZipFile]::Open($ZipFilePath, [ZipArchiveMode]::Read)

[StringBuilder]$SB = [StringBuilder]::new()
foreach($ZipEntry in $ZipArchive.Entries) {
    if($ZipEntry.Name -eq "App.config") {
        [StreamReader]$ZipReader = [StreamReader]::new($ZipEntry.Open())
        while ($null -ne ($line = $ZipReader.ReadLine())) {
            $null = $SB.AppendLine($line)
        }
        # Do something with the file stored in StringBuilder $SB
        Write-Host "Found file $($ZipEntry.FullName)"
        Write-Host $SB.ToString()
        Write-Host
        $null = $SB.Clear()
        $ZipReader.Dispose()
    }
}
$ZipArchive.Dispose()

More Versatile and Useful Code:

This function returns the file paths and names found in the Zip file:

using namespace System.IO
using namespace System.IO.Compression
Add-Type -Assembly System.IO.Compression, System.IO.Compression.FileSystem

function ReadFilenamesInZip {
    param (
        [Parameter(Mandatory = $true, Position = 0)]
        [string]$Path
    )
    [ZipArchive]$ZipArchive = [ZipFile]::Open($Path, [ZipArchiveMode]::Read)
    foreach($ZipEntry in $ZipArchive.Entries) {
        $ZipEntry.FullName
    }
    $ZipArchive.Dispose()
}

Example use, reading file pathnames from 01_SQLite.zip file:

$ZipFilePath = "$PSScriptRoot\01_SQLite.zip"
$FileNames = ReadFilenamesInZip -Path $ZipFilePath
$FileNames

Resulting in this output:

screenshot.png
sqlite_test.sln
sqlite_test/App.config
sqlite_test/App.xaml
sqlite_test/App.xaml.cs
sqlite_test/MainWindow.xaml
sqlite_test/MainWindow.xaml.cs
sqlite_test/packages.config
sqlite_test/Properties/AssemblyInfo.cs
sqlite_test/Properties/Resources.Designer.cs
sqlite_test/Properties/Resources.resx
sqlite_test/Properties/Settings.Designer.cs
sqlite_test/Properties/Settings.settings
sqlite_test/sqlite_test.csproj

Example use, reading file pathnames from a zip file I created named TestZip.zip:

$ZipFilePath = "$PSScriptRoot\TestZip.zip"
$FileNames = ReadFilenamesInZip -Path $ZipFilePath
$FileNames

Resulting in this output:

Folder1/Test.TXT
Folder2/Test.TXT
Test.TXT

This function returns the content of all files matching a certain file name:

using namespace System.IO
using namespace System.IO.Compression
Add-Type -Assembly System.IO.Compression, System.IO.Compression.FileSystem

function ReadFileInZip {
    param (
        [Parameter(Mandatory = $true, Position = 0)]
        [string]$Path,
        [Parameter(Mandatory = $true, Position = 1)]
        [string]$FileToUnzip,
        [Parameter(Mandatory = $false, Position = 2)]
        [int]$FileIndex = -1
    )
    [ZipArchive]$ZipArchive = [ZipFile]::Open($Path, [ZipArchiveMode]::Read)
    $ThisFileIndex = 0
    foreach($ZipEntry in $ZipArchive.Entries) {
        if($ZipEntry.Name -eq $FileToUnzip) {
            if($FileIndex -lt 0 -or $FileIndex -eq $ThisFileIndex) {
                [StreamReader]$ZipReader = [StreamReader]::new($ZipEntry.Open())
                while ($null -ne ($line = $ZipReader.ReadLine())) {
                    $line
                }
                $ZipReader.Dispose()
            }
            $ThisFileIndex++
        }
    }
    $ZipArchive.Dispose()
}

Example use of extracting from TestZip.zip the content of all internal file matching the file name Test.TXT:

$ZipFilePath = "$PSScriptRoot\TestZip.zip"
$FileLines = ReadFileInZip -Path $ZipFilePath -FileToUnzip 'Test.TXT'
if ($null -ne $FileLines) {
    Write-Host 'Found File(s):'
    $FileLines
} else {
    Write-Host 'File NOT found.'
}

Resulting in this output:

Found File(s):
### Folder 1 Text File ###
Random info in Folder 1 text file
### Folder 2 Text File ###
Random info in Folder 2 text file
### Root Text File ###
Random info in root text file

Example reading the content of only the first file with matching name -
Take note of the added -FileIndex 0:

$ZipFilePath = "$PSScriptRoot\TestZip.zip"
$FileLines = ReadFileInZip -Path $ZipFilePath -FileToUnzip 'Test.TXT' -FileIndex 0
if ($null -ne $FileLines) {
    Write-Host 'Found File(s):'
    $FileLines
} else {
    Write-Host 'File NOT found.'
}

Resulting in this output:

Found File(s):
### Folder 1 Text File ###
Random info in Folder 1 text file

Changing -FileIndex 0 to -FileIndex 2 gives these results:

Found File(s):
### Root Text File ###
Random info in root text file

Changing FileIndex to a value that does not point to a file inside the zip, such as -FileIndex 3, gives these results:

File NOT found.
Darin
  • 1,423
  • 1
  • 10
  • 12
  • Unable to find type [ZipFile] – Bruno Nov 06 '22 at 13:12
  • Why do you want to use a StringBuilder when data is binary. Putting binary data into a string will corrupt data. – jdweng Nov 06 '22 at 14:40
  • @Bruno, you have to replace `"$PSScriptRoot\01_SQLite.zip"` with the path of your zip file, and you have to replace `"App.config"` with the log file you are trying to extract. – Darin Nov 06 '22 at 16:18
  • @jdweng, he is using `Select-String` on the file he extracts. What makes you think he is trying to extract a binary file? – Darin Nov 06 '22 at 16:20
  • @Darin I've created a App.config file and compressed it in a 01_SQLite.zip file in the same directory of the script and executed exactly as you posted. – Bruno Nov 06 '22 at 19:19
  • @Bruno, are you including the using statements at the top of the script? Also, what version of PowerShell are you using? – Darin Nov 06 '22 at 19:29
  • @Bruno, I was testing the code in PowerShell 5.1, just now tried it in PowerShell 7.2.6 and it works in both. So the problem has to relate to the using state `using namespace System.IO.Compression`. If you are including that using statement, then maybe it isn't available in you OS. What OS are you using? – Darin Nov 06 '22 at 19:44
  • @Darin PSVersion 5.1.19041.1682. I've copy and paste all the code you posted, including the "using" statements. I'm running Windows 10 x64 (10.0.19043). Anyway, as I mentioned, 7-zip must be used otherwise the files won't decompress. – Bruno Nov 06 '22 at 20:50
  • @Bruno, think I found the problem. See the newly added **UPDATE** and **SOLUTION** section for details. – Darin Nov 06 '22 at 21:17
  • 1
    you need to add `Add-Type -Assembly System.IO.Compression, System.IO.Compression.FileSystem` after the using statements otherwise it will fail in 5.1. Also, since you're using a `StreamReader` I dont see the point on appending to a `StringBuilder`, mind as well just `.ReadToEnd()` on it and make it easier – Santiago Squarzon Nov 06 '22 at 22:34
  • @SantiagoSquarzon, thank you for the help! One of the experiments I did added the needed assembly to the PowerShell terminal and forgot I needed to test in a new terminal to avoid using preloaded assemblies. As for using `.ReadToEnd()`, you are right, but my focus was on giving an example of working with each line one at a time. So for now I'm just leaving it. – Darin Nov 07 '22 at 00:57
  • Great, it now works for a regular .zip file. But for my 7-zip file it throws a 'Exception calling "Open" with "2" argument(s): "End of Central Directory record could not be found." ' error. Is it possible to use the 7-zip library instead? – Bruno Nov 07 '22 at 05:43
  • 1
    @Bruno, I did some testing with 7Zip creating .ZIP files. What I found was that when using the archive format `zip`, and any of the compression methods `* Deflate`, `BZip2`, and `LZMA`, this code works. But when setting the compression methods to "Deflate64" or "PPMd", it fails. Can you change the compression method for this file? Also, I'm looking into using 7-zip's DLL to read the files - what you want seems likely possible, but not sure yet. – Darin Nov 08 '22 at 03:23
  • 1
    @Bruno, please give the newly added code a try. I place the code at the top, but left the old code farther down in case someone finds it useful. – Darin Nov 08 '22 at 15:06
  • Perfect solution @Darin , thank you for the effort and explanation. – Bruno Nov 10 '22 at 18:19
  • 1
    @Bruno, glad it worked for you! Have a good day! – Darin Nov 10 '22 at 19:50