0

I am trying to use R to unzip password protected files from a drive without using 7-zip. My organisation doesn't have access to it, we use WinZip for everything.

I have searched far and wide here but cannot find a post that satisfies the question.

I have a file that is zipped and contains a single XML file. I need to automate the collation of this data, my thinking is unzip then read. I have found these that I can't see what I need to do:

Using unzip does not support passwords - unzip a .zip file

e.g. unzip(file.xml.zip) produces Warning message: In unzip(zipfile = "file.xml.zip") : zip file is corrupt

And the file is not corrupt as I can manually unzip it fine afterwards.

Using 7-Zip (I can't access this) - Unzip a password protected file with Powershell

Reading without unzipping (get "error reading from the connection) - Extract files from password protected zip folder in R

read_xml(unz("file.xml", "file.xml.zip")) produces Error in open.connection(x, "rb") : cannot open the connection In addition: Warning message: In open.connection(x, "rb") : cannot open zip file 'file.xml'

I have tried looking at Expand-Archive in PowerShell and trying to call that through R but am not having much luck, please someone help me!

With PowerShell I use Expand-Archive -Path 'file' which produces: Exception calling "ExtractToFile" with "3" argument(s): "The archive entry was compressed using an unsupported compression method."

Mrob
  • 1
  • 2
  • What exactly did you try? Can you create some sort of reproducible example that we can use for testing? – MrFlick Feb 09 '23 at 14:47
  • Does your winzip have an executable that can be called from the command line? if so you ought to be able to do something with `system()` or `shell()` to call it. – Miff Feb 09 '23 at 15:01
  • @MrFlick Have padded out a few examples if it's of any use but none of these inherently require a password so am left a bit lost without having 7-zip – Mrob Feb 09 '23 at 15:11
  • @Miff I can start Winzip using `start WinZip` in command line but using `system("start WinZip)` results in nothing but a code "127" – Mrob Feb 09 '23 at 15:12

1 Answers1

1

I don't have WinZip, but since both it and unzip.exe (within Rtools-4.2) support password-encoding, then we should be able to use similar methods. (Or perhaps you can use unzip included with Rtools.)

Setup:

$ echo 'hello world' > file1.txt
$ echo -e 'a,b\n11,22' > file2.csv
$ c:/rtools42/usr/bin/zip.exe -P secretpassword files.zip file1.txt file2.txt
  adding: file1.txt (stored 0%)
  adding: file2.txt (stored 0%)
$ unzip -v files.zip
Archive:  files.zip
 Length   Method    Size  Cmpr    Date    Time   CRC-32   Name
--------  ------  ------- ---- ---------- ----- --------  ----
      12  Stored       12   0% 2023-02-09 10:03 af083b2d  file1.txt
      10  Stored       10   0% 2023-02-09 10:03 1c1d572e  file2.csv
--------          -------  ---                            -------
      22               22   0%                            2 files

$ unzip -c files.zip file1.txt
Archive:  files.zip
[files.zip] file1.txt password:

Okay, now we have a password-protected zip file.

In R,

readLines(pipe("unzip -q -P secretpassword -c files.zip file1.txt"))
# [1] "hello world"
read.csv(pipe("unzip -q -P secretpassword -c files.zip file2.csv"))
#    a  b
# 1 11 22

WinZip does support a command-line interface, so we should be able to use it within pipe (or system or similar). It does support passwords, I believe it uses the -s argument instead of -P. I don't know if it supports extracting a file to stdout, so you might need to explore its command-line options for that, and if not then work out storing the document to a temporary directory.

Or, assuming you have Rtools installed, you can use its unzip as above without relying on WinZip.

Note:

  • Including the password as a command-line argument is relatively unsafe: users on the same host (if a multi-user system) can see the password in clear text by looking at the process list. I'm not certain if there's an easy way around this.
r2evans
  • 141,215
  • 6
  • 77
  • 149
  • I have tried this but to no avail, the only differences I see to you is mine has this structure: `readLines(pipe("unzip -q -P password -c file.000.xml.zip file.000.xml"))` so I am looking at xml files (I assume that makes no difference?) and in the first file name I am putting .xml.zip – Mrob Feb 09 '23 at 15:28
  • Have you tried quoting the password? `*` and `!` may be causing problems. (I just tried a zip file with _that_ password and do not experience any problems.) Which unzip are you using? `Sys.which("unzip")`. – r2evans Feb 09 '23 at 15:34
  • And what do you mean *"to no avail"*? Is there an error? You might be able to use `read_xml(pipe(...))` instead of `readLines` or `read.csv`. – r2evans Feb 09 '23 at 15:35
  • Apologies to be clear these: `readLines(pipe("unzip -q -P 'password' -c file.xml.zip file.xml"))` returns `character(0)` read_xml(pipe("unzip -q -P 'password' -c file.xml.zip file.xml")) returns `Error in read_xml.raw(raw, encoding = encoding, base_url = base_url, as_html = as_html, : Failed to parse text` Quotes around password returns the same errors and `Sys.which("unzip")` is `"C:\\rtools42\\usr\\bin\\unzip.exe"` – Mrob Feb 09 '23 at 15:51
  • (1) Double-check your password. Assuming you have no other messages/warnings, the only way I can reproduce `character(0)` is if the password is wrong. Or if the file is truly empty. (Missing/wrong filenames emit `caution: filename not matched`.) (2) I can reproduce similar xml results using just `xml2::read_xml(pipe(...))`, no need for `read_xml.raw`, perhaps you're using a package other than `xml2`? – r2evans Feb 09 '23 at 15:58
  • If I follow your example, after creating the files successfully: `c:/rtools42/usr/bin/zip.exe -P secretpassword files.zip file1.txt file2.csv` then gives the error `adding: file1.txt (stored 0%) adding: file2.csv (stored 0%) zip warning: new zip file left as: zihBdrbc zip I/O error: Function not implemented zip error: Could not create output file (was replacing the original zip file)` – Mrob Feb 09 '23 at 16:07
  • Please make sure your issues are not due to existing zip files clouding `zip`'s steps. – r2evans Feb 09 '23 at 16:08
  • 1
    Thanks, I'll keep digging, appreciate the help and patience! I have XML and xml2 loaded, might do a full restart and saving of the zip file to see if that irons out any kinks – Mrob Feb 09 '23 at 16:08