2

I am wondering I can debug pyspark codes in Jpyter notebook? I have tried the solution for regular python codes in Jupyter using ipdb module.

What is the right way to debug in iPython notebook?

But it is not working in with notebook with pyspark kernel..

Please note that: My question is about debugging pyspark within Jupypter notebook and not in ItelliJ IDE or any other python IDEs.

background:

  • I am on MacOS yosemite.
  • My spark version is 1.6.2
  • Jupyter kernel is:Apache Toree PySpark
  • I have ipdb installed.

Any help would be greatly appreciated.

Community
  • 1
  • 1
chikitin
  • 33
  • 9

1 Answers1

0

In Jyupter notebook if you want to play around and debug PySpark code, once Spark is installed and set up (good guide to show you how here: https://blog.sicara.com/get-started-pyspark-jupyter-guide-tutorial-ae2fe84f594f) you can import SparkSession and create a local instance:

from pyspark.sql import SparkSession
spark = SparkSession.builder.master("local[1]").appName("pyspark-test").getOrCreate()
df = spark.read.csv("test.csv", header=True)
AJR
  • 177
  • 2
  • 7