Why I see random folders for the delta table that I create through column mappings using databricks in azure storage container

Question

I am working on a pyspark code that reads csv files and save that to the azure storage container using databricks. Writing the code to read and write the csv files seems to be straight forward using reader and writer api. As we are dealing with csv files, we have some csv files whose column headers have spaces in them, so we decided to use Column mapping functionality that databricks provide in order to support our use case. In order to make our code generic and common to read all csv files whether they have special characters or spaces in column header or not, we decided to create all the delta tables that we create using that feature.

I see some odd structure on the azure storage container where the actual parquet files get created.

Image of delta table physical parquet files without using Column Mapping:

Image of delta table physical parquet files using Column Mapping:

I am not doing any partitioning in both the cases.

Just wanted to know why those random folders get created and what is the importance of those folders when someone use column mapping feature of databricks.

I have tried searching for detailed documentation around it but could not find anything fruitful.

Update:

I am using a simple create table script by providing three additional TBLProperties as mentioned in the column mapping document.

CREATE TABLE delta_table
  USING DELTA
  TBLPROPERTIES ('delta.minReaderVersion' = '2',
  'delta.minWriterVersion' = '5',
  'delta.columnMapping.mode' = 'name')

Ref Doc: https://learn.microsoft.com/en-us/azure/databricks/delta/delta-column-mapping

Can you provide the code that you are using for column mapping? — Saideep Arikontham, Dec 06 '22 at 09:33
I just added a link to the doc from where I am using the way by which i am setting those table properties.. — Nikunj Kakadiya, Dec 08 '22 at 05:55
So, you have a delta table already and you are altering the `Alter` from the document to change column names. Is this right? — Saideep Arikontham, Dec 08 '22 at 05:56
I am creating it for the first time with the create table script and adding those additional table properties so that I can have a delta table that support column mappings on future insert/update/delete operations.. — Nikunj Kakadiya, Dec 08 '22 at 06:01

Why I see random folders for the delta table that I create through column mappings using databricks in azure storage container

0 Answers0