Apply Multiclass version of SVM classifier in sklearn library over data stored in SPARK Dataframe

Asked Dec 20 '18 at 05:58

Active Dec 02 '19 at 21:41

Viewed 378 times

I have saved spark dataframe using df.write.parquet("filePath") as a parquet file. I want to apply Sklearn SVM classifier over this dataframe columns as Multiclass SVM is absent in Spark ML/MLLib library.

I tried reading the saved data with spark.read.parquet('FilePath') and then converting it to pandas using toPandas(). However this conversion from spark DataFrame to pandas DataFrame is taking too much time as data is huge(81GB). Is it possible to directly read the saved SPARK DataFrame in pandas DataFrame.

If not, what will be the best way to proceed further?

edited Dec 02 '19 at 21:41

asked Dec 20 '18 at 05:58

drp

Possible duplicate of [How to read a Parquet file into Pandas DataFrame?](https://stackoverflow.com/questions/33813815/how-to-read-a-parquet-file-into-pandas-dataframe) – Kirk Broadhurst Feb 28 '19 at 17:00

Apply Multiclass version of SVM classifier in sklearn library over data stored in SPARK Dataframe

0 Answers0