1

My requirement is to read an excel using Pyspark, while doing same getting below error.

Or else alternatively is there any solution using Pandas to read excel and convert into Pyspark dataframe ? Any one is fine.

lat_data=spark.read.format('com.crealytics.spark.excel').option("header","true").load("a1.xlsx")

error: Py4JJavaError: An error occurred while calling o756.load. : java.lang.ClassNotFoundException: Failed to find data source: com.crealytics.spark.excel.

Thanks in advance.

RK.
  • 571
  • 4
  • 13
  • 29
  • https://stackoverflow.com/questions/59854917/reading-excel-xlsx-file-in-pyspark – Devang Sanghani Feb 14 '22 at 07:23
  • I installed below driver to my cluster in databricks notebook by following the step and it started working fine ::::::::::::::::::: clusters > your cluster > libraries > install new > select Maven > com.crealytics:spark-excel_2.12:0.13.5. Thanks for the help. – RK. Feb 14 '22 at 17:27

1 Answers1

0

You need to install the crealytics library. You can do it via pip:

pip install xlrd

Luiz Viola
  • 2,143
  • 1
  • 11
  • 30