0

DESCRIPTION: I am using RStudio and I have a zip file called databse.zip. The file is encrypted by a password. My goal is to be able to load the file, decrypt it, extract the sqlite database inside (database.db) and be able to make a query to it.

So far I tried the following:

# Load required packages
library(dotenv)
library(Hmisc)
library(RSQLite)

# Get password
password = Sys.getenv("PASSWORD")

# Load .zip file and decrypt it
test = getZip("databse.zip", password=password)

# Connect to the SQLite database
con = dbConnect(RSQLite::SQLite(), test) # ERROR! Error in path.expand(path) : invalid 'path' argument

# Get list of items from table in the database
my_data = dbGetQuery(con, "SELECT column_name FROM table") # I do not even reach this point

PROBLEM: Basically, I am able to load and decrypt the file but then I have no idea how to connect to the sqlite database and make a query to it.

QUESTION: Would you be able to suggest a smart and elegant way to achieve my goal please? Feel free to use different packages if needed.

Federico Gentile
  • 5,650
  • 10
  • 47
  • 102
  • So while you are able to load and use unencrypted database files, you for some reason have problems applying the same method to the unencrypted database which is the result of decrypting an encrypted file. Right? What makes you think that the decrypting (unzipping with password) was sucessful? – Yunnosch Jan 09 '23 at 08:54
  • I mean I can load the file and decrypt it but then I cannot get the data out of it. So My code cannot reach the end – Federico Gentile Jan 09 '23 at 09:01
  • What makes you think that it was successful? Were you successful to do whatever you are trying here with an already unencrypted file? – Yunnosch Jan 09 '23 at 09:59
  • yes because I am able to load and decrypt the file with the function getZip(). My problem is there I have no idea about how to read the data from the table – Federico Gentile Jan 09 '23 at 10:06
  • 2
    You have to provide a path to your sqlite file in `dbConnect(RSQLite::SQLite(), test)` , but `getZip()` returns a connection object. And by any chance, are you on Windows? Just by looking at the source, I'd guess it's likely to fail on most Win installations as `getZip ` assumes there's `unzip` in the environment path - https://github.com/harrelfe/Hmisc/blob/b0470d70dd2adac867127f8a63eb4ea58b7be7b4/R/Misc.s#L1419 . Check https://stackoverflow.com/questions/37665451/unzip-password-protected-zip-files-in-r if you are OK with extracting with 7z. – margusl Jan 09 '23 at 10:07
  • I tried the suggestion you propose in the stackoverflow link. However, after I call the: sys_command <- paste0("7z ", "x ", file, " -p", pw) system(sys_command) I am not sure what to do. How can I connect to the sqlite file? – Federico Gentile Jan 09 '23 at 10:42
  • @margusl, fyi, win11 R-4.2, `Sys.which("unzip")` for me returns `"C:\\rtools42\\usr\\bin\\unzip.exe"`, which should be available for all (non-austere) installations of R on windows. – r2evans Jan 09 '23 at 14:10
  • I cannot get `DBI`/`RSQLite` to operate on a zipped sqlite3 file, even without a password. I think @margusl's comment about the "connection object" is the important point here: `RSQLite` requires a path to an actual file (or `":memory:"`), not a connection, since it needs to do its own random-access in its own way. I don't think you'll be able to access the database in R without fully extracting out of the zip file. (I can't get it to work using the `sqlite3.exe` binary either, it doesn't work on a pipe either.) – r2evans Jan 09 '23 at 14:18
  • 1
    @r2evans, good point. I guess it's more of an indicator of the last time I built a package from source :) , must say it didn't occur to me rtools also provides a set of command line tools – margusl Jan 09 '23 at 15:16

1 Answers1

1

getZip() is used to stream a content of a single file form zip archive without storing it on a disk first, it returns connection object. This would work with flat files like zipped csv-s while using it with e.g. read.csv(). For DBI / RSQLite you (generally) need to have a file on your disk.

As R can't handle password-protected zip archives natively, we use 7z, though any other suitable command line extractor would work as well (e.g. unzip from rtools in case of Win).

Here we first create password-protected zip arhive to create a reproducible example, then extract it to extracted/ in R session tmp. directory and connect to extracted SQLite database.

library(DBI)
library(glue)

# 7z executable
sevenz <- "C:/Program Files/7-Zip/7z.exe"
archive_pw <- "pw123"

tmp_dir <- tempdir()
# open temp folder in file manager :
# browseURL(tmp_dir)

# prepare reprex ----------------------------------------------------------
# store mtcars in sqlite, create pasword-protected zip archive

sqlite_f <- file.path(tmp_dir, "mtcars.sqlite")
con <- dbConnect(RSQLite::SQLite(), sqlite_f)
data("mtcars")
dbWriteTable(con, "mtcars", mtcars)
dbDisconnect(con)
# double-quotes to handle spaces in paths
system(glue('"{sevenz}" a "{sqlite_f}.zip" "{sqlite_f}" -p"{archive_pw}"' ))
#> [1] 0


# extract and load sqlite DB from password-protected zip ------------------

dest <- file.path(tmp_dir, "extracted")
system(glue('"{sevenz}" e "{sqlite_f}.zip" -o"{dest}" -p"{archive_pw}"' ))
#> [1] 0
con <- dbConnect(RSQLite::SQLite(), file.path(dest, "mtcars.sqlite"))
dbGetQuery(con, "SELECT * FROM mtcars LIMIT 5")
#>    mpg cyl disp  hp drat    wt  qsec vs am gear carb
#> 1 21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
#> 2 21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
#> 3 22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
#> 4 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
#> 5 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
dbDisconnect(con)

Created on 2023-01-10 with reprex v2.0.2

margusl
  • 7,804
  • 2
  • 16
  • 20