I'm trying to get a file list (and then extract specific files) from a large (300-600 MB) remotely-hosted tar.gz file -- without downloading the entire file. However, I don't quite understand whether my file should be treated as binary or not, or how to get rid of embedded nuls without manipulating the file. I've seen questions that address remote gzipped binary files or untarring local gzipped files but not untarring remotely-hosted gzipped tar files.
I've tried using gzfile
:
example.url <- "https://neon-microbial-raw-seq-files.s3.data.neonscience.org/2017/BMI_B69RN_ITS_R1_fastq.tar.gz"
con <- gzfile(example.url)
test.list <- utils::untar(
tarfile = con,
list = T)
which returns:
Error in readBin(con, "raw", n = 512L) :
can only read from a binary connection
If I run open(con, "rb")
, I get an error saying the file doesn't exist. Opening the connection as binary without gzfile()
instead gives an error about embedded nulls:
bcon <- url("https://neon-microbial-raw-seq-files.s3.data.neonscience.org/2017/BMI_B69RN_ITS_R1_fastq.tar.gz")
open(bcon, "rb")
test.list <- utils::untar(
tarfile = bcon,
list = T)
which returns:
Error in rawToChar(block[seq_len(ns)]) :
embedded nul in string: '\037\x8b\b\0\x9e\x9c\xbbZ\0\003\xec[is䶙\x9e\xcf\xfe\025\xfe\xc8\003\xea\xe6\t\x9eM\022\004\001T\xaa\034'\xb1\xb9\x95\xfd65\xb5\xf1Ʈ\xb5=\x8e=\xaeڟ\xbf\xef\001\xb2[Rk\xd4s9\x9br \x89\r\002 \xc0\026\037>\xef\x89\xc3\xf1p\x9c\xbex\xfd\xe3\u07ff\xf8\xee\xc7\xffy\xf1iJ\xc2\xe5\xa9\xcf$K'
Lastly, using gzcon
returns a different error involving embedded nulls:
test.list <- utils::untar(
tarfile = gzcon(url(example.url)),
list = T)
which returns:
Error in rawToChar(block[seq_len(ns)]) :
embedded nul in string: '././@LongLink\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\00000000\00000000\00000000\000000000201\000000000000\0011556\0 L\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0ustar \0root\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0root'
Any help is appreciated!