How to use Azure databricks to read and write excel data with multiple sheets from ADLS gen 2

Question

I want to implement the below logic in Azure databricks using pyspark. I have a below file which has multiple sheets in it. the file is present on adls gen 2. I want to read the data of all sheets into a different file and write the file to some location in adls gen 2 itself.

Note: All sheet has same schema ( Id, Name)

My final output file should have data from all the sheets. Also I need to create an additional column which stores the sheetName info

Karthikeyan Rasipalay Durairaj · Answer 1 · 2021-10-27T18:48:17.910

0

You can use the following logic

Using Pandas to read multiple worksheets of the same workbook link
concat the multiple dataframes in Pandas and make it single data frame link
Convert the Panda dataframe into pyspark dataframe .link
Apply Business logic which you want to implement.

edited Oct 27 '21 at 18:48

answered Oct 27 '21 at 18:20

Karthikeyan Rasipalay Durairaj

1,920
13
35

How to use Azure databricks to read and write excel data with multiple sheets from ADLS gen 2

1 Answers1