Questions tagged [spark-excel]

Spark Excel tag is related to reading Excel files (xlsx) through Apache Spark.

There exists multiple libraries helping developers to read Excel files through Apache Spark. The most common ones are :

45 questions
19
votes
5 answers

How to construct Dataframe from a Excel (xls,xlsx) file in Scala Spark?

I have a large Excel(xlsx and xls) file with multiple sheet and I need convert it to RDD or Dataframe so that it can be joined to other dataframe later. I was thinking of using Apache POI and save it as a CSV and then read csv in dataframe. But if…
koiralo
  • 22,594
  • 6
  • 51
  • 72
11
votes
2 answers

Reading Excel (.xlsx) file in pyspark

I am trying to read a .xlsx file from local path in PySpark. I've written the below code: from pyspark.shell import sqlContext from pyspark.sql import SparkSession spark = SparkSession.builder \ .master('local') \ .appName('Planning')…
OMG
  • 243
  • 2
  • 3
  • 12
5
votes
1 answer

How in Scala/Spark create excel file with multiple sheets from multiple DataFrame?

In Scala/Spark application I created two different DataFrame. My task is to create one excel file with two sheet for each DataFrame. I decided to use spark-excel library but I am little bit confused. As far as I understand the future excel file is…
Nurzhan Nogerbek
  • 4,806
  • 16
  • 87
  • 193
5
votes
1 answer

What are the mandatory options for loading Excel file?

I have loaded an excel file from S3 using the below syntax, but I am wondering about the options that need to be set here. Why is it mandatory to set all the below options for loading excel file? None of these options are mandatory for loading…
Garipaso
  • 391
  • 2
  • 8
  • 22
4
votes
1 answer

How to write Dataset to a excel file using hadoop office library in apache spark java

Currently I am using com.crealytics.spark.excel to read an Excel file, but using this library I can't write the dataset to an Excel file. This link says that using hadoop office library (org.zuinnote.spark.office.excel) we can read and write to…
Shashi Kumar
  • 147
  • 11
3
votes
3 answers

How to read multiple Excel files and concatenate them into one Apache Spark DataFrame?

Recently I wanted to do Spark Machine Learning Lab from Spark Summit 2016. Training video is here and exported notebook is available here. The dataset used in the lab can be downloaded from UCI Machine Learning Repository. It contains a set of…
tomaskazemekas
  • 5,038
  • 5
  • 30
  • 32
2
votes
0 answers

Azure Databricks - Import statement failing in scala cells

Hope you are all doing well. We have been facing a weird issue with our notebooks. We are using a couple of scala packages. When we import the scala package in scala cell, the imports are failing with the below mentioned error. Here I am considering…
2
votes
0 answers

Write Spark Dataset to Excel File along with partitioning

I have a Dataset similar to the below structure: col_A col_B date 1 5 2021-04-14 2 7 2021-04-14 3 5 2021-04-14 4 9 2021-04-14 I am trying to use the below code in Spark…
KCK
  • 2,015
  • 2
  • 17
  • 35
2
votes
2 answers

spark-excel dataype issues

I am using spark-excel package for processing ms excel files using spark 2.2. Some of the files are getting failed to load as a spark dataframe with below exception. If someone have faced this issue can you please help to fix such data type…
nilesh1212
  • 1,561
  • 2
  • 26
  • 60
2
votes
2 answers

How to write dataset object to excel in spark java?

I Am reading excel file using com.crealytics.spark.excel package. Below is the code to read an excel file in spark java. Dataset SourcePropertSet = sqlContext.read() .format("com.crealytics.spark.excel") …
BHANUMATHI H M
  • 307
  • 1
  • 3
  • 16
1
vote
1 answer

Why data frame not throwing RunTimeException with "FAILFAST" option in spark while reading using com.crealytics.spark.excel?

schema = df = spark.read.format("com.crealytics.spark.excel").\ option("useHeader", "true").\ option("mode", "FAILFAST"). \ schema(schema).\ option("dataAddress", "Sheet1"). \ …
1
vote
0 answers

Pyspark - issue reading excel data with - "useHeader," "false"

I'm trying to read some excel data into Pyspark Dataframe. I'm using the library: 'com.crealytics:spark-excel_2.11:0.11.1'. I don't have a header in my data. I'm able to read successfully when reading from column A onwards, but when I'm trying to…
1
vote
1 answer

How to mention individual sheet names while writing mutiple org.apache.spark.sql.Dataset into an .xls file using crealytics / spark-excel in java?

I am trying to write different Java Datasets into an excel file which will contain multiple sheets inside it using crealytics/spark-excel library. com.crealytics
1
vote
0 answers

How to read excel file column data which comes from formula using Apache Spark

I am trying to read the one excel file in spark. I am using crealytics library for this. But my code is getting failed as one of the column is referring sheet to populate its value through VLOOKUP formula. I have been trying with "crealytics"…
Anand Jha
  • 11
  • 4
1
vote
0 answers

Is there any way to set the style of the Excel file via Apache POI in Scala/Spark application?

I my Scala application I use a Spark plugin (spark-excel) for creating and writing Excel files with several new sheets via Apache POI. import spark.implicits._ val df = Seq( ("2019-01-01 00:00:00", "7056589658"), ("2019-02-02 00:00:00",…
Nurzhan Nogerbek
  • 4,806
  • 16
  • 87
  • 193
1
2 3