I'm using Pyspark to try to read a zip file from blob storage. I want to unzip the file once loaded, and then write the unzipped CSVs back to blob storage.
I'm following this guidance which explains how to unzip the file once read: https://docs.databricks.com/_static/notebooks/zip-files-python.html
But it doesn't explain how I read the zip from blob. I have the following code
file_location = "path_to_my.zip"
df = sqlContext.read.format("file_location").load
I expected this to load the zip to databricks as df
, and from there I could follow the advice from the article to unzip, load the csvs to a dataframe and then write the dataframes back to blob.
Any ideas on how to initially read the zip file from blob using pyspark?
Thanks,