The Spark Notebook is a web application enabling interactive and reproductible data analysis using Apache Spark from the browser
Questions tagged [spark-notebook]
120 questions
26
votes
5 answers
What are SparkSession Config Options
I am trying to use SparkSession to convert JSON data of a file to RDD with Spark Notebook. I already have the JSON file.
val spark = SparkSession
.builder()
.appName("jsonReaderApp")
.config("config.key.here", configValueHere)
…

Sha2b
- 447
- 1
- 5
- 12
8
votes
3 answers
How to import libraries in Spark Notebook
I'm having trouble importing magellan-1.0.4-s_2.11 in spark notebook. I've downloaded the jar from https://spark-packages.org/package/harsha2010/magellan and have tried placing SPARK_HOME/bin/spark-shell --packages harsha2010:magellan:1.0.4-s_2.11…

Curtis Chong
- 783
- 2
- 13
- 26
7
votes
2 answers
How to import one databricks notebook into another?
I have a python notebook A in Azure Databricks having import statement as below:
import xyz, datetime, ...
I have another notebook xyz being imported in notebook A as shown in above code.
When I run notebook A, it throws the following error:…

user39602
- 339
- 2
- 5
- 13
5
votes
3 answers
How to show my existing column name instead '_c0', '_c1', '_c2', '_c3', '_c4' in first row?
Data frame showing _c0,_c1 instead my original column names in first row.
i want to show My column name which is on first row of my CSV.
dff =
spark.read.csv("abfss://dir@acname.dfs.core.windows.net/
diabetes.csv")
…

Gaurav Gangwar
- 467
- 3
- 11
- 24
5
votes
1 answer
Scala Spark : (org.apache.spark.repl.ExecutorClassLoader) Failed to check existence of class org on REPL class server at path
Running basic df.show() post spark notebook installation
I am getting the following error when running scala - spark code on spark-notebook. Any idea when this occurs and how to avoid?
[org.apache.spark.repl.ExecutorClassLoader] Failed to check…

Leothorn
- 1,345
- 1
- 23
- 45
5
votes
4 answers
recursive cte in spark SQL
; WITH Hierarchy as
(
select distinct PersonnelNumber
, Email
, ManagerEmail
from dimstage
union all
select e.PersonnelNumber
, e.Email
…

SQLGirl
- 287
- 2
- 3
- 6
5
votes
0 answers
Want to run Spark(scala) kernel inside Jupyter Notebook. Getting OSError: [WinError 193] %1 is not a valid Win32 application
Traceback (most recent call last):
File "c:\users\rdx\anaconda3\lib\runpy.py", line 184, in _run_module_as_main
"__main__", mod_spec)
File "c:\users\rdx\anaconda3\lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File…

Darshan
- 352
- 8
- 24
5
votes
2 answers
How do I create a Spark RDD from Accumulo 1.6 in spark-notebook?
I have a Vagrant image with Spark Notebook, Spark, Accumulo 1.6, and Hadoop all running. From notebook, I can manually create a Scanner and pull test data from a table I created using one of the Accumulo examples:
val instanceNameS = "accumulo"
val…

snerd
- 1,238
- 1
- 14
- 28
4
votes
1 answer
Is it possible to embed the HTML output of a Zeppelin Notebook so that the output can be looked at when the server hosting the Notebook isn't active?
I have a Zeppelin Notebook producing interactive graphs. I don't want to have to host the notebook indefinitely but I want to have that interactive output appear on another website. I understand that I can "link to this paragraph" and then embed the…

Danny David Leybzon
- 670
- 1
- 9
- 21
4
votes
3 answers
How to run spark-notebook on docker on MacOS X?
Running the spark-notebook using docker on OSX (via boot2docker) doesn't seem to do anything. Here's the output
pkerp@toc:~/apps/spark-notebook$ docker run -p 9000:9000 andypetrella/spark-notebook:0.1.4-spark-1.2.0-hadoop-1.0.4
Play server process…

juniper-
- 6,262
- 10
- 37
- 65
3
votes
1 answer
Can i run stored procedure in spark pool synapse?
I wanted to know how we can run stored procedure in spark pool (azure synapse) which i have created in dedicated SQL pool. Also can we run SQL queries to access data in ddsql pool in notebook.

darkstar
- 39
- 6
3
votes
1 answer
Setting spark.driver.maxResultSize in EMR notebook jupyter
I am using Jupyter notebook in emr to handle large chunks of data. While processing data I see this error:
An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.spark.SparkException: Job aborted due…

Amit Kumar
- 377
- 4
- 17
3
votes
2 answers
Execution of cmd cells in databricks notebook based on some condition
I have a python 3.5 notebook in databricks. I have a requirement to execute databricks notebook cells based on some conditions. I didn't see any functionality out of the box.
I have tried creating a python egg with the below code and installed it…

Samrat De
- 63
- 1
- 4
3
votes
0 answers
Spark notebooks is quicker than executing a jar
I have finished some code in spark notebook, I tried to move it into a real project, and use sbt to generate a jar, then use the spark-submit to execute it.
Problem: It takes just 10 minutes to get the result in spark notebooks, but it takes almost…

Leyla Lee
- 466
- 5
- 19
3
votes
1 answer
Cell width Jupyter notebook - Apache Toree - Scala
How do I increase cell width of a Jupyter notebook with Apache Toree - Scala kernel?
The usual
from IPython.core.display import display, HTML
display(HTML(""))
indeed does not work.

mastro
- 619
- 1
- 8
- 17