I am trying to spin up on Python and PySpark. I followed this page on installing and checking PySpark in Anaconda on Windows. The following checking code works:
>>> import findspark
>>> findspark.init()
>>> findspark.find()
'C:\\Users\\User.Name\\anaconda3\\envs\\py39\\lib\\site-packages\\pyspark'
>>> from pyspark.sql import SparkSession
>>> spark = SparkSession.builder.appName('SparkExamples.com').getOrCreate()
>>> data = [("Java","20000"), ("Python","100000"), ("Scala","3000")]
>>> columns = ["language","users_count"]
>>> df = spark.createDataFrame(data).toDF(*columns)
>>> df.show()
+--------+-----------+
|language|users_count|
+--------+-----------+
| Java| 20000|
| Python| 100000|
| Scala| 3000|
+--------+-----------+
I am tried accessing the online help for the methods createDataFrame
and toDF
. Getting help on createDataFrame
was straighforward: help(spark.createDataFrame)
.
I haven't been able to access the online help for toDF
:
>>> help(spark.toDF)
AttributeError: 'SparkSession' object has no attribute 'toDF'
>>> help(DataFrame.toDF)
NameError: name 'DataFrame' is not defined
>>> help(spark.DataFrame.toDF)
AttributeError: 'SparkSession' object has no attribute 'DataFrame'
>>> help(DataFrame)
NameError: name 'DataFrame' is not defined
>>> help(spark.DataFrame)
AttributeError: 'SparkSession' object has no attribute 'DataFrame'
(1) How is the documentation accessed?
(2) Is there a scheme for accessing the help that one can infer based on the checking code above?