In Pyspark, I am using a mounted data lake container with the following content:
dbutils.fs.ls("/mnt/adlslirkov/hudl-layers/raw/BIMS/2022/01/")
Out[91]: [FileInfo(path='dbfs:/mnt/adlslirkov/hudl-layers/raw/BIMS/2022/01/01/', name='01/', size=0),
FileInfo(path='dbfs:/mnt/adlslirkov/layers/raw/BM/2022/01/02/', name='02/', size=0),
FileInfo(path='dbfs:/mnt/adlslirkov/layers/raw/BM/2022/01/03/', name='03/', size=0),
FileInfo(path='dbfs:/mnt/adlslirkov/layers/raw/BM/2022/01/04/', name='04/', size=0),
FileInfo(path='dbfs:/mnt/adlslirkov/layers/raw/BM/2022/01/05/', name='05/', size=0),
FileInfo(path='dbfs:/mnt/adlslirkov/layers/raw/BM/2022/01/06/', name='06/', size=0),
FileInfo(path='dbfs:/mnt/adlslirkov/layers/raw/BM/2022/01/07/', name='07/', size=0),
FileInfo(path='dbfs:/mnt/adlslirkov/layers/raw/BM/2022/01/08/', name='08/', size=0),
FileInfo(path='dbfs:/mnt/adlslirkov/layers/raw/BM/2022/01/09/', name='09/', size=0),
FileInfo(path='dbfs:/mnt/adlslirkov/layers/raw/BM/2022/01/10/', name='10/', size=0)]
Using a loop I would like to create a DataFrame for each each of the files within these folders. I would like to have 10 DataFrames with names like df_bm_01012020
, df_bm_02012022
..etc. where the first two digits are the name of the folder where the file is. This is what I have right now:
df_default_name = "df_bm_"
df_default_path = "/mnt/adlslir/layers/raw/BM/2022/01/"
df_dict = {}
for i in dbutils.fs.ls("/mnt/adlslir/layers/raw/BM/2022/01/"):
# convert each element to list
lst_paths = list(i)
# create dictionary
df_dict[df_default_name + lst_paths[-2].replace('/', '') + "012022"] = df_default_path + lst_paths[-2].replace('/', '') + '/'
for i, y in df_dict.items():
i = spark.read.format("parquet").option("header", True).load(y)
i.display()
The last for loop returns all 10 DataFrames at once. However I would like to be able to access each one of them using its name. So for example if within the next cell I say display(df_bm_07012022)
, I would like to get that DataFrame, which is for the particular day. How should I do that?