I have these three files file_1.csv, file_2.csv, file_3.json inside tar.gz file. I want to read file_1.csv in spark dataframe
something like this:
df = spark.read.csv("s3://my_bucket/key/my_file_.tar.gz/file_1.csv")
There isn't a really good way of accessing a file in a tarball (.tar.gz/.tar) without extracting the files first. Here's a reference to someone else's question about opening files in a tarball without extracting first.
.tar.gz/.tar