0

When I run the example code in cmd, everything is ok.

>>> import pyspark
>>> l = [('Alice', 1)]
>>> spark.createDataFrame(l).collect()
[Row(_1='Alice', _2=1)]

But when I execute the code in pycharm, I get an error.

spark.createDataFrame(l).collect()
NameError: name 'spark' is not defined

Maybe something wrong when I link Pycharm to pyspark.

Environment Variable

Project Structure

Project Interpreter

  • Are you missing the part where you define `spark`: `from pyspark.sql import SparkSession; spark=SparkSession.builder.getOrCreate()`? What version of spark? – pault Oct 30 '19 at 17:35
  • spark version: 2.4.4 But in CMD, I also don't define spark. – jiaying chen Oct 30 '19 at 17:43
  • You don't have to define those in pyspark shell - they are automatically defined for you – pault Oct 30 '19 at 18:27

1 Answers1

0

When you start pyspark from the command line, you have a sparkSession object and a sparkContext available to you as spark and sc respectively.

For using it in pycharm, you should create these variables first so you can use them.

from pyspark.sql import SparkSession

spark = SparkSession.builder.getOrCreate()
sc = spark.sparkContext

EDIT:

Please have a look at : Failed to locate the winutils binary in the hadoop binary path

Community
  • 1
  • 1
pissall
  • 7,109
  • 2
  • 25
  • 45
  • After that, I get a new error. `19/10/30 13:46:36 ERROR Shell: Failed to locate the winutils binary in the hadoop binary path java.io.IOException: Could not locate executable C:\hadoop-2.7.7\bin\winutils.exe in the Hadoop binaries.` – jiaying chen Oct 30 '19 at 17:53
  • How did you install pyspark? I've never used windows, so I don't really know. Will be happy to look it up.. EDIT: I have edited my answer – pissall Oct 30 '19 at 17:54