I have a spark dataframe which i can convert to pandas dataframe using the
toPandas()
method available in pyspark.
I have the following queries regarding this?
- Does this conversion break the purpose of using spark itself(Distributed computing)?
- The dataset is going to be huge , so what about the speed and memory issues?
- If somebody can also explain ,what exactly happens with this one line of code,that would really help.
Thanks