Alternative ways to extract .csv file from .zip file in FTP server

Question

I have created a code that downloads .zip file to my computer memory and then extracts .csv file from the .zip. However, I do not want to save .zip file to my computer. Instead, I would like to directly download .csv file to my working environment. Are there any other solutions to do so?

temp <- tempfile()
download.file("ftp://usr:pwd@ftp.XXXX.com//Folder1/Folder2/file.zip",temp)
df <- read.csv(unz(temp, "my_csv.csv"), header = TRUE, sep = ",")             
unlink(temp)

Even if you would use a probably existing package solution for this, the zip is always tempfiled. Your excellent solution makes this fact most transparent, and it is also beneficial to know which files need to be shredded. — jay.sf, Dec 31 '21 at 14:16
FTP does not have a mechanism for downloading part of a zip file, just for downloading the file itself. Because of that, *any* client (not just R-based) will need to download the entire zip file in order to get at a file. It is possible to download a file into RAM only (and not onto the hard drive as a temp file), but this comes with liabilities that are not easy to side-step. Because of this, the most efficient process has been to download into a temp file and do what is needed after download. If you have avenues other than FTP, explore them, else I think there is no way to do what you ask. — r2evans, Dec 31 '21 at 14:27
The alternative to unzipping it on the client computer is to unzip it on the server but to do that you would need to have ssh access to it. — G. Grothendieck, Dec 31 '21 at 15:02
And if you have ssh access, then it might be as direct as `ssh user@remote 'unzip -p file.zip innerfile.csv'` and capture the output. — r2evans, Dec 31 '21 at 16:12
@r2evans Most FTP servers actually support downloading part of a file only. After all, that what allows clients to resume an interrupted FTP download. So it's not really true that it is not possible. Though I do not have an R solution. But for a proof of the concept, see for example https://stackoverflow.com/q/53143518/850848 — Martin Prikryl, Jan 02 '22 at 19:11
@MartinPrikryl Sure, but is there an automated or at least easy way to "know" the byte range for downloading? I recognize that my claim was technically inaccurate, but I suspect (without testing) that the user would need to know at least a little about the zip file composition to be able to define the byte range. Optional compression, and the presence/order/size of other files make that non-trivial without at least a 2-step operation (where the first download would need to download sufficient bytes to be able to reconstruct a table of its contents ... does that even exist?). — r2evans, Jan 02 '22 at 19:32
@r2evans Yes, it's non trivial, but possible. One has to read the ZIP file contents/directory at the end of the archive to determine position of the desired file in the ZIP file. — Martin Prikryl, Jan 02 '22 at 21:18
@MartinPrikryl ... and I doubt that there's an "R function" for doing that ;-) — r2evans, Jan 02 '22 at 21:58

Alternative ways to extract .csv file from .zip file in FTP server

0 Answers0