2

I have saved spark dataframe using df.write.parquet("filePath") as a parquet file. I want to apply Sklearn SVM classifier over this dataframe columns as Multiclass SVM is absent in Spark ML/MLLib library.

I tried reading the saved data with spark.read.parquet('FilePath') and then converting it to pandas using toPandas(). However this conversion from spark DataFrame to pandas DataFrame is taking too much time as data is huge(81GB). Is it possible to directly read the saved SPARK DataFrame in pandas DataFrame.

If not, what will be the best way to proceed further?

drp
  • 340
  • 1
  • 13
  • Possible duplicate of [How to read a Parquet file into Pandas DataFrame?](https://stackoverflow.com/questions/33813815/how-to-read-a-parquet-file-into-pandas-dataframe) – Kirk Broadhurst Feb 28 '19 at 17:00

0 Answers0