0

I'm spinning up on both Python and PySpark. I followed this page on installing PySpark in Anaconda on Windows. I tried to get online help on a DataFrame class and its toDF method. From this explanation, the required import (and subsequent help commands) are:

from pyspark.sql import DataFrame # User import command
help(DataFrame)
help(DataFrame.toDF)

The code works, but I don't understand why, even after reading extensively on packages, modules, and initialization (e.g., here, here, and here).

The DataFrame class is defined in package pyspark, subpackage sql, module file dataframe.py. File pyspark/sql/__init__.py contains initialization

# __init__.py import command
from pyspark.sql.dataframe import DataFrame, DataFrameNaFunctions, DataFrameStatFunctions

I see how this __init__.py import command puts the DataFrame class in the current namespace. In order for the User import command at the top to run, however, DataFrame must appear like a module in the pyspark.sql subpackage. I don't see how the __init__.py import command accomplishes this.

Can someone explain, point to a key passage in one of my cited resources, and/or refer me to other information?

user2153235
  • 388
  • 1
  • 11

0 Answers0