I have an excel file (.xlsx) file in the datalake. I need to read that file into a pyspark dataframe. I do no want to use pandas library.
I have installed the crealytics library in my databricks cluster and tried with below code:
dbutils.fs.cp('/path/to/excel/file','/FileStore/tables/',True)
path='/dbfs/FileStore/tables//myfile1.xlsx'
excel_df=spark.read.format("com.crealytics.spark.excel").option("header","true").option("inferSchema","true").load("/FileStore/tables/myfile1.xlsx")
Im getting the below error:
java.lang.NoSuchMethodError: org.apache.commons.io.IOUtils.byteArray(I)[B
Am I missing anything here or any other approach can be tried other than Pandas. Also I need to read multiple sheets in the excel file. Please suggest.