2

I have 2 GB of data in memory (for example data = b'ab' * 1000000000) that I would like to write in a encrypted ZIP or 7Z file.

How to do this without writing data to a temporary on-disk file?

Is it possible with only Python built-in tools (+ optionally 7z)?

I've already looked at this:

  • ZipFile.writestr writes from a in-memory string/bytes which is good but:

  • ZipFile.setpassword: only for read, and not write

  • How to create an encrypted ZIP file? : most answers use a file as input (and cannot work with in-memory data), especially the solutions with pyminizip and those with:

    subprocess.call(['7z', 'a', '-mem=AES256', '-pP4$$W0rd', '-y', 'myarchive.zip']... 
    

    Other solutions require to trust an implementation of cryptography by a third-party tool (see comments), so I would like to avoid them.

Basj
  • 41,386
  • 99
  • 383
  • 673
  • It seems like the only *actual* question here is about writing the output as encrypted? Unless you have working, native Python code that uses a temporary file? "most answers use a file as input" That doesn't appear to be the case to me; but anyway, did you *try* the answers that don't? For example, the ones referring to `pyminizip`, or `pyzipper`? Alternately, did you try to check if any of the tools you might `subprocess.call`, take the input data from standard input? If they do, do you see how to use pipes to solve the problem? – Karl Knechtel Mar 14 '22 at 08:32
  • 1
    @KarlKnechtel Look at https://github.com/danifus/pyzipper/blob/master/pyzipper/zipfile_aes.py here they are doing crypto themselves, I would like to avoid to trust such a third-party project like this, and use built-in tools (ZipFile, or maybe `7z.exe` if we can pipe binary data to it without writing to a temp file) – Basj Mar 14 '22 at 09:05
  • @KarlKnechtel `pyminizip` works from a file, not from in-memory data. – Basj Mar 14 '22 at 09:10
  • What do you plan to do with the zip file in memory, because it will be destroyed as soon as the process is completed? This is important, because any solution needs to consider this. – Life is complex Mar 17 '22 at 14:28
  • 1
    @Lifeiscomplex once it is encrypted (and only then), it will be written on disk. – Basj Mar 17 '22 at 14:47
  • Thanks for the info. How are you planning to handled key management for the encryption process? Also do you want the ZIP file password protected too? – Life is complex Mar 17 '22 at 14:52
  • @Lifeiscomplex Oh yes, it should be ZIP-password-encrypted or 7Z-password-encrypted. In any case, I don't want to do the encryption part myself, but just use ZIP/7z built-in encryption. – Basj Mar 17 '22 at 15:44
  • I don't know anything about python, but the 7zip C++ SDK provides a stream-based interface to create/extract various archive formats. Perhaps someone has written a python wrapper around it. – Luke Mar 18 '22 at 08:00
  • @Luke I have looked at Python wrappers around 7z library, but for example for this one: https://py7zr.readthedocs.io/en/latest/api.html#compression-methods I see `encryption/decryption: depend on pycryptodomex`. I don't understand why this is the case. I don't want the library to roll its own crypto, but rather the one included in the 7z library directly. – Basj Mar 18 '22 at 08:04
  • @Luke Is there a DLL such as "7z.dll" that we can call from C++ code, that would do the 7z compression + password encryption (using AES)? If so, feel free to post an answer, I'll try to convert the code into Python with [cffi](https://cffi.readthedocs.io/en/latest/). – Basj Mar 18 '22 at 08:07
  • @cffi Yes, it can be done but it requires a lot of plumbing as the 7z API is similar to COM, requiring the caller to implement various interfaces. – Luke Mar 18 '22 at 10:12

5 Answers5

2

ORIGINAL POST 03.19.2022

Here is one way to accomplish your use case using pyzipper

import fs
import pyzipper

# create in-memory file system
mem_fs = fs.open_fs('mem://')
mem_fs.makedir('hidden_dir')

# generate data 
data = b'ab' * 10

secret_password = b'super secret password'

# Create encrypted password protected ZIP file in-memory
with pyzipper.AESZipFile(mem_fs.open('/hidden_dir/password_protected.zip', 'wb'),
                         'w',
                         compression=pyzipper.ZIP_LZMA,
                         encryption=pyzipper.WZ_AES) as zf:
    zf.setpassword(secret_password)
    zf.writestr('data.txt', data)


# Read encrypted password protected ZIP file from memory
with pyzipper.AESZipFile(mem_fs.open('/hidden_dir/password_protected.zip', 'rb')) as zf:
    zf.setpassword(secret_password)
    my_secrets = zf.read('data.txt')
    print(my_secrets)
    # output 
    b'abababababababababab'

UPDATED 03.21.2022

Reading through our comments you continue to raise concerns about the cryptography components of modules, such as pyzipper, but not 7Z LIB/SDK. Here is an academic paper on 7Z LIB/SDK version 19 cryptography.

Based on your concerns have you considered encrypting your data in memory prior to writing it to a zipfile?

Here is an example for doing this and writing the encrypted data to a file in memory:

import os
import fs
import base64

from cryptography.fernet import Fernet
from cryptography.hazmat.primitives import hashes
from cryptography.hazmat.primitives.kdf.pbkdf2 import PBKDF2HMAC

mem_fs = fs.open_fs('mem://')
mem_fs.makedir('hidden_dir')

password = b"password"
salt = os.urandom(16)
kdf = PBKDF2HMAC(
    algorithm=hashes.SHA256(),
    length=32,
    salt=salt,
    iterations=390000,
)

key = base64.urlsafe_b64encode(kdf.derive(password))
f = Fernet(key)

data = b'ab' * 10

encrypted_message = f.encrypt(data)

with mem_fs.open('hidden_dir/encrypted.text', 'wb') as in_file_in_memory:
    in_file_in_memory.write(encrypted_message)
    in_file_in_memory.close()

with mem_fs.open('hidden_dir/encrypted.text', 'rb') as out_file_in_memory:
    raw_data = out_file_in_memory.read()
    decrypted_data = f.decrypt(raw_data)
    print(decrypted_data)
    # output
    b'abababababababababab'

Previously in the comments I mentioned key management, which is similar to maintaining a list of passwords for your zip archives.

I don't know your setup, but you could pregenerate keys in advance and stored them in a secure way for use in your code.

I don't have 7z installed on my Mac, so I could only give you pseudocode. The examples below aren't using 7z.

import os
import fs
import base64
import pyzipper
from zipfile import ZipFile
from cryptography.fernet import Fernet

mem_fs = fs.open_fs('mem://')
mem_fs.makedir('hidden_dir')

# pregenerate key
f = Fernet(b'-6_WO-GLrlXexdSbon_fKJoVOVBh66LdYrEM0Kvcwf0=')

data = b'ab' * 10

encrypted_message = f.encrypt(data)

with mem_fs.open('hidden_dir/encrypted.text', 'wb') as in_file_in_memory:
    in_file_in_memory.write(encrypted_message)
    in_file_in_memory.close()

# This uses standard ZIP with no password, but the data
# is encrypted 
with mem_fs.open('hidden_dir/encrypted.text', 'rb') as out_file_in_memory:
    raw_data = out_file_in_memory.read()
    with ZipFile('archive.zip', mode='w') as zip_file:
        zip_file.writestr('file.txt', raw_data)

# This uses pyzipper to create a password word protected 
# encrypted file, which stores the encrypted.text.
# overkill, because the data is already encrypted prior
with mem_fs.open('hidden_dir/encrypted.text', 'rb') as out_file_in_memory:
    raw_data = out_file_in_memory.read()
    secret_password = b'super secret password'
    # Create encrypted password protected ZIP file in-memory
    with pyzipper.AESZipFile('password_protected.zip',
                             'w',
                             compression=pyzipper.ZIP_LZMA,
                             encryption=pyzipper.WZ_AES) as zf:
        zf.setpassword(secret_password)
        zf.writestr('data.txt', raw_data)

I'm still looking into how to pipe this encrypted.text to subprocess 7-zip.

Life is complex
  • 15,374
  • 5
  • 29
  • 58
  • Thanks! Here https://github.com/danifus/pyzipper/blob/master/pyzipper/zipfile_aes.py they are doing their own crypto, which I wanted to avoid (this is very error-prone except if the author is a known crytography expert). Is there a way to call the official `zip` or `7z` SDK/library and let this actual lib do the encryption part, instead of doing the crypto ourselves? – Basj Mar 18 '22 at 16:49
  • @Basj doesn't `pyzipper` and ` py7zr` use the same crypto, which is a fork of PyCrypto? Are you worried that the cryptography in these modules will be cracked or have an unknown weakness? – Life is complex Mar 18 '22 at 17:28
  • In fact I don't understand why don't these modules just call the ZIP or 7Z LIB/SDK which probably already contains the cryptography part? Do you know why @Lifeiscomplex? And yes, calling PyCryptodome and doing our own stuff (I have already done it too ;)) can always produce an unknown weakness :) I prefer to trust libgzip.dll or lib7z.dll (I don't remember the exact name) which is used by millions of people/projects than a new implementation of 7z encryption done by a lesser-known Python module. – Basj Mar 18 '22 at 17:43
  • The built-in ZIP module doesn't have encryption capabilities or a way to set a password for writing. So far I haven't found an update Python module that implements 7Z LIB/SDK. – Life is complex Mar 19 '22 at 00:12
  • I started looking into the cryptography part of `pyzipper`. The cryptography linked to this module is well used and has 10,000s of users. The cryptography module was first released in 2014 and has gone through around 47 updates. Could there be a hidden flaw? Yes, but any software can have security issues. – Life is complex Mar 19 '22 at 04:04
2

7z.exe has the -si flag, which lets it read data for a file from stdin. This way you could still use 7z's commandline from a subprocess even without an extra file:

from subprocess import Popen, PIPE

# inputs
szip_exe = r"C:\Program Files\7-Zip\7z.exe"  # ... get from registry maybe
memfiles = {"data.dat" : b'ab' * 1000000000}
arch_filename = "myarchive.zip"
arch_password = "Swordfish"

for filename, data in memfiles.items():
    args = [szip_exe, "a", "-mem=AES256", "-y", "-p{}".format(arch_password),
        "-si{}".format(filename), output_filename]
    proc = Popen(args, stdin=PIPE, stdout=PIPE, stderr=PIPE)
    proc.stdin.write(data)
    proc.stdin.close()    # causes 7z to terminate 
    # proc.stdin.flush()  # instead of close() when on Mac, see comments
    proc.communicate()    # wait till it actually has

The write() takes somewhat above 40 seconds on my machine, which is quite a lot. I can't say though if that's due to any inefficiencies from piping the whole data through stdin or if it's just how long compressing and encrypting a 2GB file takes. EDIT: Packing the file from HDD took 47 seconds on my machine, which speaks for the latter.

Jeronimo
  • 2,268
  • 2
  • 13
  • 28
  • I tried and it works even if `data` contains NULL bytes or other binary data! How does it work so that `\x00` or other binary data won't be considered as end-of-stream/pipe/file @Jeronimo? – Basj Mar 23 '22 at 11:15
  • @Basj I just set the archive name as `myarchive.7z` on my Mac and got no errors. – Life is complex Mar 23 '22 at 13:43
  • @Lifeiscomplex Thanks for the report, I'll investigate on this separately (I removed my off-topic comment). – Basj Mar 23 '22 at 14:11
  • True, I get the same error when using a .7z extension (on Windows). I Can't find much more useful info on that flag and how it's to be used anywhere though. :( – Jeronimo Mar 23 '22 at 14:11
  • Regarding the binary data / nullbytes: I'm definitely not an expert here, but I assume it reads from stdin until it reaches an EOF, which is not an actual character (or sequence) in the stream but an option/flag set to the stdin file stream and which can be polled regularly, for example after a read() returned 0 bytes. And that's what's triggered by the `stdin.close()`. On the cmdline using the `a.exe < b.dat` schema, the flag should get set by cmd.exe after the last byte of the file has been written to the stdin of the process. – Jeronimo Mar 23 '22 at 15:29
  • I also noted that on my Mac I couldn't use `proc.stdin.close().`. I had to use `proc.stdin.flush()`. P.S. Thanks for posting this answer I worked for hours trying to use the `-si` switch. You code provided me a way to do this. I'm upvoting this answer. – Life is complex Mar 23 '22 at 17:29
1

It would probably be simplest to use third-party applications such as RadeonRAMDisk to emulate disk operations in-memory, but you stated you prefer not to. Another possibility is to extend PyFilesystem to allow encrypted zip-file operations on a memory filesystem.

Roy Bubis
  • 101
  • 5
0

I don't know about Python, but it is possible to do this using the 7zip C++ interface. However, it is a lot of work. Here's an excerpt from my implementation that I'm using for a project that has to pack zip files:

class CArchiveUpdateCallback : public IArchiveUpdateCallback
{
public:
    STDMETHODIMP GetProperty(UInt32 Index, PROPID PropID, PROPVARIANT* PropValue)
    {
        const std::wstring& FilePath = m_FileList[Index].first;
        const std::wstring& ItemPath = m_FileList[Index].second;
        switch (PropID)
        {
        case kpidPath:
            V_VT(PropValue) = VT_BSTR;
            V_BSTR(PropValue) = SysAllocString(ItemPath.c_str());
            break;
        case kpidSize:
            V_VT(PropValue) = VT_UI8;
            PropValue->uhVal.QuadPart = Utils::GetSize(FilePath);
            break;
        }
        return S_OK;
    }
    STDMETHODIMP GetStream(UInt32 ItemIndex, ISequentialInStream** InStream)
    {
        const std::wstring& FilePath = m_FileList[ItemIndex].first;
        HRESULT hr = CInStream::Create(FilePath, IID_ISequentialInStream, (void**)InStream);
        return hr;
    }
protected:
    std::vector<std::pair<std::wstring, std::wstring>> m_FileList;
};

It currently works exclusively with on-disk files, but could be modified to accommodate in-memory buffers. For example, it could operate on a list of (in-memory) IInStream objects instead of a list of file paths.

Luke
  • 11,211
  • 2
  • 27
  • 38
0

I am not able able to understand why you don't want temporary on-disk file, as it would reduce the complexity.

And yes I have found few solution which requires only built in modules of python:

    • You can use subprocess to interact with powershell and create zip file using powershell command. You can either run the command or save the command in a .ps1 file and execute it. (This solution requires you to install 7zip software)
def run(self, cmd):
    completed = subprocess.run(["powershell", "-Command", cmd], capture_output=True)
    return completed
  • and the powershell code would be:
# Note this code is not written by me, link is provide to the actual owner
function Write-ZipUsing7Zip([string]$FilesToZip, [string]$ZipOutputFilePath, [string]$Password, [ValidateSet('7z','zip','gzip','bzip2','tar','iso','udf')][string]$CompressionType = 'zip', [switch]$HideWindow)
{
    # Look for the 7zip executable.
    $pathTo32Bit7Zip = "C:\Program Files (x86)\7-Zip\7z.exe"
    $pathTo64Bit7Zip = "C:\Program Files\7-Zip\7z.exe"
    $THIS_SCRIPTS_DIRECTORY = Split-Path $script:MyInvocation.MyCommand.Path
    $pathToStandAloneExe = Join-Path $THIS_SCRIPTS_DIRECTORY "7za.exe"
    if (Test-Path $pathTo64Bit7Zip) { $pathTo7ZipExe = $pathTo64Bit7Zip }
    elseif (Test-Path $pathTo32Bit7Zip) { $pathTo7ZipExe = $pathTo32Bit7Zip }
    elseif (Test-Path $pathToStandAloneExe) { $pathTo7ZipExe = $pathToStandAloneExe }
    else { throw "Could not find the 7-zip executable." }

    # Delete the destination zip file if it already exists (i.e. overwrite it).
    if (Test-Path $ZipOutputFilePath) { Remove-Item $ZipOutputFilePath -Force }

    $windowStyle = "Normal"
    if ($HideWindow) { $windowStyle = "Hidden" }

    # Create the arguments to use to zip up the files.
    # Command-line argument syntax can be found at: http://www.dotnetperls.com/7-zip-examples
    $arguments = "a -t$CompressionType ""$ZipOutputFilePath"" ""$FilesToZip"" -mx9"
    if (!([string]::IsNullOrEmpty($Password))) { $arguments += " -p$Password" }

    # Zip up the files.
    $p = Start-Process $pathTo7ZipExe -ArgumentList $arguments -Wait -PassThru -WindowStyle $windowStyle

    # If the files were not zipped successfully.
    if (!(($p.HasExited -eq $true) -and ($p.ExitCode -eq 0)))
    {
        throw "There was a problem creating the zip file '$ZipFilePath'."
    }
}
  1. Using powershell dependency 7zip4PowerShell and then interact with the shell using subprocess. (Link provided)
  • Launch PowerShell with administrative escalation.
  • Install the 7-zip module by entering the cmdlet below. It does query the PS gallery and uses a third-party repository to download the dependencies. If you’re OK with the security considerations, approve the installation to proceed:

Install-Module -Name 7zip4PowerShell -Verbose

  • Change directories to where you want the compressed file saved.
  • Create a secure string for your compressed file’s encryption by entering the cmdlet below:

$SecureString = Read-Host -AsSecureString

  • Enter the password you wish to use in PowerShell. The password will be obfuscated by asterisks. The plain text entered will be converted to $SecuresString, and you’ll use that in the next step.

  • Enter the following cmdlet to encrypt the resulting compressed file:

Compress-7zip -Path "\path ofiles" -ArchiveFileName "Filename.zip" -Format Zip -SecurePassword $SecureString

  • The resulting ZIP file will be saved to the chosen directory once the command has completed processing.

  • You can either follow the process in powershell terminal, or just interact with the terminal after installing the dependency using subprocess.

References:

Faraaz Kurawle
  • 1,085
  • 6
  • 24