Error in Pycharm when linking to pyspark: name 'spark' is not defined

Question

When I run the example code in cmd, everything is ok.

>>> import pyspark
>>> l = [('Alice', 1)]
>>> spark.createDataFrame(l).collect()
[Row(_1='Alice', _2=1)]

But when I execute the code in pycharm, I get an error.

spark.createDataFrame(l).collect()
NameError: name 'spark' is not defined

Maybe something wrong when I link Pycharm to pyspark.

Environment Variable

Project Structure

Project Interpreter

Are you missing the part where you define `spark`: `from pyspark.sql import SparkSession; spark=SparkSession.builder.getOrCreate()`? What version of spark? — pault, Oct 30 '19 at 17:35
You don't have to define those in pyspark shell - they are automatically defined for you — pault, Oct 30 '19 at 18:27

score 0 · Answer 1 · edited Jun 20 '20 at 09:12

0

When you start pyspark from the command line, you have a sparkSession object and a sparkContext available to you as spark and sc respectively.

For using it in pycharm, you should create these variables first so you can use them.

from pyspark.sql import SparkSession

spark = SparkSession.builder.getOrCreate()
sc = spark.sparkContext

EDIT:

Please have a look at : Failed to locate the winutils binary in the hadoop binary path

edited Jun 20 '20 at 09:12

Community

1
1

answered Oct 30 '19 at 17:44

pissall

7,109
2
25
45

After that, I get a new error. `19/10/30 13:46:36 ERROR Shell: Failed to locate the winutils binary in the hadoop binary path java.io.IOException: Could not locate executable C:\hadoop-2.7.7\bin\winutils.exe in the Hadoop binaries.` – jiaying chen Oct 30 '19 at 17:53
How did you install pyspark? I've never used windows, so I don't really know. Will be happy to look it up.. EDIT: I have edited my answer – pissall Oct 30 '19 at 17:54

Error in Pycharm when linking to pyspark: name 'spark' is not defined

1 Answers1

EDIT: