Questions tagged [spark-excel]

Spark Excel tag is related to reading Excel files (xlsx) through Apache Spark.

There exists multiple libraries helping developers to read Excel files through Apache Spark. The most common ones are :

Crealytics Spark-Excel
- Github: https://github.com/crealytics/spark-excel
- Maven: https://mvnrepository.com/artifact/com.crealytics/spark-excel
Spark HadoopOffice
- Github: https://github.com/ZuInnoTe/spark-hadoopoffice-ds/
- Package: https://spark-packages.org/package/ZuInnoTe/spark-hadoopoffice-ds

45 questions

votes

5 answers

How to construct Dataframe from a Excel (xls,xlsx) file in Scala Spark?

I have a large Excel(xlsx and xls) file with multiple sheet and I need convert it to RDD or Dataframe so that it can be joined to other dataframe later. I was thinking of using Apache POI and save it as a CSV and then read csv in dataframe. But if…

asked May 26 '17 at 08:13

koiralo

22,594
6
51
72

votes

2 answers

Reading Excel (.xlsx) file in pyspark

I am trying to read a .xlsx file from local path in PySpark. I've written the below code: from pyspark.shell import sqlContext from pyspark.sql import SparkSession spark = SparkSession.builder \ .master('local') \ .appName('Planning')…

apache-spark pyspark spark-excel

asked Jan 22 '20 at 07:48

OMG

votes

1 answer

How in Scala/Spark create excel file with multiple sheets from multiple DataFrame?

In Scala/Spark application I created two different DataFrame. My task is to create one excel file with two sheet for each DataFrame. I decided to use spark-excel library but I am little bit confused. As far as I understand the future excel file is…

excel scala dataframe apache-spark spark-excel

asked Aug 29 '19 at 03:13

Nurzhan Nogerbek

4,806
16
87
193

votes

1 answer

What are the mandatory options for loading Excel file?

I have loaded an excel file from S3 using the below syntax, but I am wondering about the options that need to be set here. Why is it mandatory to set all the below options for loading excel file? None of these options are mandatory for loading…

excel scala apache-spark apache-spark-sql spark-excel

asked Jun 08 '17 at 05:21

Garipaso

votes

1 answer

How to write Dataset to a excel file using hadoop office library in apache spark java

Currently I am using com.crealytics.spark.excel to read an Excel file, but using this library I can't write the dataset to an Excel file. This link says that using hadoop office library (org.zuinnote.spark.office.excel) we can read and write to…

java apache-spark apache-spark-sql spark-excel

asked Jun 28 '17 at 10:28

Shashi Kumar

votes

3 answers

How to read multiple Excel files and concatenate them into one Apache Spark DataFrame?

Recently I wanted to do Spark Machine Learning Lab from Spark Summit 2016. Training video is here and exported notebook is available here. The dataset used in the lab can be downloaded from UCI Machine Learning Repository. It contains a set of…

excel scala apache-spark apache-spark-dataset spark-excel

asked Mar 12 '17 at 14:38

tomaskazemekas

5,038
5
30
32

votes

0 answers

Azure Databricks - Import statement failing in scala cells

Hope you are all doing well. We have been facing a weird issue with our notebooks. We are using a couple of scala packages. When we import the scala package in scala cell, the imports are failing with the below mentioned error. Here I am considering…

scala apache-spark databricks azure-databricks spark-excel

asked Jun 14 '22 at 14:20

rainingdistros

votes

0 answers

Write Spark Dataset to Excel File along with partitioning

I have a Dataset similar to the below structure: col_A col_B date 1 5 2021-04-14 2 7 2021-04-14 3 5 2021-04-14 4 9 2021-04-14 I am trying to use the below code in Spark…

scala apache-spark apache-spark-dataset spark-excel

asked Apr 26 '21 at 16:23

KCK

2,015
2
17
35

votes

2 answers

spark-excel dataype issues

I am using spark-excel package for processing ms excel files using spark 2.2. Some of the files are getting failed to load as a spark dataframe with below exception. If someone have faced this issue can you please help to fix such data type…

excel apache-spark apache-spark-sql apache-poi spark-excel

asked Jan 17 '18 at 12:30

nilesh1212

1,561
2
26
60

votes

2 answers

How to write dataset object to excel in spark java?

I Am reading excel file using com.crealytics.spark.excel package. Below is the code to read an excel file in spark java. Dataset SourcePropertSet = sqlContext.read() .format("com.crealytics.spark.excel") …

apache-spark pyspark apache-spark-sql spark-excel

asked Jun 24 '17 at 07:23

BHANUMATHI H M

vote

1 answer

Why data frame not throwing RunTimeException with "FAILFAST" option in spark while reading using com.crealytics.spark.excel?

schema = df = spark.read.format("com.crealytics.spark.excel").\ option("useHeader", "true").\ option("mode", "FAILFAST"). \ schema(schema).\ option("dataAddress", "Sheet1"). \ …

excel apache-spark pyspark spark-excel

asked Dec 29 '21 at 11:46

Girish Jambkar

vote

0 answers

Pyspark - issue reading excel data with - "useHeader," "false"

I'm trying to read some excel data into Pyspark Dataframe. I'm using the library: 'com.crealytics:spark-excel_2.11:0.11.1'. I don't have a header in my data. I'm able to read successfully when reading from column A onwards, but when I'm trying to…

pyspark spark-excel

asked Dec 31 '20 at 11:51

Abhishek Choudhary

vote

1 answer

How to mention individual sheet names while writing mutiple org.apache.spark.sql.Dataset into an .xls file using crealytics / spark-excel in java?

I am trying to write different Java Datasets into an excel file which will contain multiple sheets inside it using crealytics/spark-excel library. com.crealytics …

java apache-spark dataset rdd spark-excel

asked Mar 03 '20 at 04:51

Niranjan Balasubramani

vote

0 answers

How to read excel file column data which comes from formula using Apache Spark

I am trying to read the one excel file in spark. I am using crealytics library for this. But my code is getting failed as one of the column is referring sheet to populate its value through VLOOKUP formula. I have been trying with "crealytics"…

java scala apache-spark spark-excel

asked Nov 05 '19 at 13:44

Anand Jha

vote

0 answers

Is there any way to set the style of the Excel file via Apache POI in Scala/Spark application?

I my Scala application I use a Spark plugin (spark-excel) for creating and writing Excel files with several new sheets via Apache POI. import spark.implicits._ val df = Seq( ("2019-01-01 00:00:00", "7056589658"), ("2019-02-02 00:00:00",…

excel scala apache-spark apache-poi spark-excel

asked Sep 17 '19 at 19:00

Nurzhan Nogerbek

4,806
16
87
193

2 3 Next