0

I have these three files file_1.csv, file_2.csv, file_3.json inside tar.gz file. I want to read file_1.csv in spark dataframe

something like this:

df = spark.read.csv("s3://my_bucket/key/my_file_.tar.gz/file_1.csv")
dsl1990
  • 1,157
  • 5
  • 13
  • 25

1 Answers1

0

There isn't a really good way of accessing a file in a tarball (.tar.gz/.tar) without extracting the files first. Here's a reference to someone else's question about opening files in a tarball without extracting first.

jg925
  • 47
  • 1
  • 8