0

I want to implement the below logic in Azure databricks using pyspark. I have a below file which has multiple sheets in it. the file is present on adls gen 2. I want to read the data of all sheets into a different file and write the file to some location in adls gen 2 itself.

Note: All sheet has same schema ( Id, Name)

My final output file should have data from all the sheets. Also I need to create an additional column which stores the sheetName info

enter image description here

enter image description here

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
amikm
  • 65
  • 1
  • 2
  • 11

1 Answers1

0

You can use the following logic

  • Using Pandas to read multiple worksheets of the same workbook link
  • concat the multiple dataframes in Pandas and make it single data frame link
  • Convert the Panda dataframe into pyspark dataframe .link
  • Apply Business logic which you want to implement.