Pyspark - issue reading excel data with - "useHeader," "false"

Question

I'm trying to read some excel data into Pyspark Dataframe. I'm using the library: 'com.crealytics:spark-excel_2.11:0.11.1'.

I don't have a header in my data. I'm able to read successfully when reading from column A onwards, but when I'm trying to read from two columns down the line - like [N,O], I get a Dataframe with all nulls.

My data is as below:

e.g , When reading from A2:B4, I get the correct Dataframe:

+-----+-----+
|  _c0|  _c1|
+-----+-----+
|data2|data6|
|data3|data7|
|data4|data8|
+-----+-----+

But using the same code, just changing 'dataAddress' to N2:O4, I get Dataframe with nulls:

+----+----+
| _c0| _c1|
+----+----+
|null|null|
|null|null|
|null|null|
+----+----+

My code:

from pyspark.sql import SparkSession

from com.crealytics.spark.excel import *

spark = SparkSession.builder.appName("excel_try").enableHiveSupport().getOrCreate()


exldf = spark.read.format("com.crealytics.spark.excel")\
    .option("dataAddress","N2:O4")\
    .option("useHeader","false")\
    .option("inferSchema","true")\
    .load("/path/excel_false.xlsx")
 
 
exldf.show() 

spark.stop()

Run using:

spark-submit --master yarn --packages com.crealytics:spark-excel_2.11:0.11.1 excel_false.py

Can someone please help with a solution?

There is no option named `useHeader` in [documentation](https://github.com/crealytics/spark-excel). You mean `header` I guess — blackbishop, Dec 31 '20 at 16:47
Also, I couldn't reproduce the issue using the code you provided — blackbishop, Dec 31 '20 at 17:14
`useHeader`, is used for version `2.11:0.11.1`. And what issue are u facing while recreating? — Abhishek Choudhary, Jan 01 '21 at 06:39
I used this version `spark-excel_2.11-0.13.6` with option `header` and it works fine — blackbishop, Jan 01 '21 at 15:54

Pyspark - issue reading excel data with - "useHeader," "false"

0 Answers0