2

As far as I know VectorAssembler enables you to combine multiple columns into one column, containing a Vector. This column you can later pass to different ML algorithms and preprocessing implementations.

I would like to know whether there's something like "VectorDisassembler", that is, a helper which would take one Vector column and split its values back into multiple columns (e.g. at the end of ML pipeline)?

If not, what is the best way to achieve that (best in Python, if possible)?

Here's what I had in mind:

PcaComponents = Row(*["p"+str(i) for i in range(35)])
pca_features = reduced_dataset_df.map(lambda x: PcaComponents(*x[0].values.tolist())).toDF()

Can we do better?

Kobe-Wan Kenobi
  • 3,694
  • 2
  • 40
  • 67
  • Have a look at this answer [here](https://stackoverflow.com/a/38385033/6028910), unfortunately it seems the only way to do it with the Spark DataFrames API is to either use a UDF (which tend to be slow), or else use an RDD approach. – Shane Halloran Nov 08 '17 at 20:07

0 Answers0