I am reading a CSV through
data=sc.textFile("filename")
Df = Sparksql.create dataframe()
Pdf = Df.toPandas ()
Now is Pdf distributed across the spark cluster or it resides in the environment of host machine??
I am reading a CSV through
data=sc.textFile("filename")
Df = Sparksql.create dataframe()
Pdf = Df.toPandas ()
Now is Pdf distributed across the spark cluster or it resides in the environment of host machine??
No.
As it says in the PySpark source code of DataFrame:
.. note:: This method should only be used if the resulting Pandas's DataFrame is expected
to be small, as all the data is loaded into the driver's memory.