Questions tagged [spark-notebook]

The Spark Notebook is a web application enabling interactive and reproductible data analysis using Apache Spark from the browser

120 questions
26
votes
5 answers

What are SparkSession Config Options

I am trying to use SparkSession to convert JSON data of a file to RDD with Spark Notebook. I already have the JSON file. val spark = SparkSession .builder() .appName("jsonReaderApp") .config("config.key.here", configValueHere) …
Sha2b
  • 447
  • 1
  • 5
  • 12
8
votes
3 answers

How to import libraries in Spark Notebook

I'm having trouble importing magellan-1.0.4-s_2.11 in spark notebook. I've downloaded the jar from https://spark-packages.org/package/harsha2010/magellan and have tried placing SPARK_HOME/bin/spark-shell --packages harsha2010:magellan:1.0.4-s_2.11…
Curtis Chong
  • 783
  • 2
  • 13
  • 26
7
votes
2 answers

How to import one databricks notebook into another?

I have a python notebook A in Azure Databricks having import statement as below: import xyz, datetime, ... I have another notebook xyz being imported in notebook A as shown in above code. When I run notebook A, it throws the following error:…
user39602
  • 339
  • 2
  • 5
  • 13
5
votes
3 answers

How to show my existing column name instead '_c0', '_c1', '_c2', '_c3', '_c4' in first row?

Data frame showing _c0,_c1 instead my original column names in first row. i want to show My column name which is on first row of my CSV. dff = spark.read.csv("abfss://dir@acname.dfs.core.windows.net/ diabetes.csv") …
5
votes
1 answer

Scala Spark : (org.apache.spark.repl.ExecutorClassLoader) Failed to check existence of class org on REPL class server at path

Running basic df.show() post spark notebook installation I am getting the following error when running scala - spark code on spark-notebook. Any idea when this occurs and how to avoid? [org.apache.spark.repl.ExecutorClassLoader] Failed to check…
Leothorn
  • 1,345
  • 1
  • 23
  • 45
5
votes
4 answers

recursive cte in spark SQL

; WITH Hierarchy as ( select distinct PersonnelNumber , Email , ManagerEmail from dimstage union all select e.PersonnelNumber , e.Email …
SQLGirl
  • 287
  • 2
  • 3
  • 6
5
votes
0 answers

Want to run Spark(scala) kernel inside Jupyter Notebook. Getting OSError: [WinError 193] %1 is not a valid Win32 application

Traceback (most recent call last): File "c:\users\rdx\anaconda3\lib\runpy.py", line 184, in _run_module_as_main "__main__", mod_spec) File "c:\users\rdx\anaconda3\lib\runpy.py", line 85, in _run_code exec(code, run_globals) File…
5
votes
2 answers

How do I create a Spark RDD from Accumulo 1.6 in spark-notebook?

I have a Vagrant image with Spark Notebook, Spark, Accumulo 1.6, and Hadoop all running. From notebook, I can manually create a Scanner and pull test data from a table I created using one of the Accumulo examples: val instanceNameS = "accumulo" val…
snerd
  • 1,238
  • 1
  • 14
  • 28
4
votes
1 answer

Is it possible to embed the HTML output of a Zeppelin Notebook so that the output can be looked at when the server hosting the Notebook isn't active?

I have a Zeppelin Notebook producing interactive graphs. I don't want to have to host the notebook indefinitely but I want to have that interactive output appear on another website. I understand that I can "link to this paragraph" and then embed the…
Danny David Leybzon
  • 670
  • 1
  • 9
  • 21
4
votes
3 answers

How to run spark-notebook on docker on MacOS X?

Running the spark-notebook using docker on OSX (via boot2docker) doesn't seem to do anything. Here's the output pkerp@toc:~/apps/spark-notebook$ docker run -p 9000:9000 andypetrella/spark-notebook:0.1.4-spark-1.2.0-hadoop-1.0.4 Play server process…
juniper-
  • 6,262
  • 10
  • 37
  • 65
3
votes
1 answer

Can i run stored procedure in spark pool synapse?

I wanted to know how we can run stored procedure in spark pool (azure synapse) which i have created in dedicated SQL pool. Also can we run SQL queries to access data in ddsql pool in notebook.
3
votes
1 answer

Setting spark.driver.maxResultSize in EMR notebook jupyter

I am using Jupyter notebook in emr to handle large chunks of data. While processing data I see this error: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe. : org.apache.spark.SparkException: Job aborted due…
3
votes
2 answers

Execution of cmd cells in databricks notebook based on some condition

I have a python 3.5 notebook in databricks. I have a requirement to execute databricks notebook cells based on some conditions. I didn't see any functionality out of the box. I have tried creating a python egg with the below code and installed it…
3
votes
0 answers

Spark notebooks is quicker than executing a jar

I have finished some code in spark notebook, I tried to move it into a real project, and use sbt to generate a jar, then use the spark-submit to execute it. Problem: It takes just 10 minutes to get the result in spark notebooks, but it takes almost…
Leyla Lee
  • 466
  • 5
  • 19
3
votes
1 answer

Cell width Jupyter notebook - Apache Toree - Scala

How do I increase cell width of a Jupyter notebook with Apache Toree - Scala kernel? The usual from IPython.core.display import display, HTML display(HTML("")) indeed does not work.
mastro
  • 619
  • 1
  • 8
  • 17
1
2 3 4 5 6 7 8