Spark VectorAssembler

Asked Apr 05 '17 at 07:17

Active Apr 05 '17 at 07:17

Viewed 795 times

As far as I know VectorAssembler enables you to combine multiple columns into one column, containing a Vector. This column you can later pass to different ML algorithms and preprocessing implementations.

I would like to know whether there's something like "VectorDisassembler", that is, a helper which would take one Vector column and split its values back into multiple columns (e.g. at the end of ML pipeline)?

If not, what is the best way to achieve that (best in Python, if possible)?

Here's what I had in mind:

PcaComponents = Row(*["p"+str(i) for i in range(35)])
pca_features = reduced_dataset_df.map(lambda x: PcaComponents(*x[0].values.tolist())).toDF()

Can we do better?

asked Apr 05 '17 at 07:17

Kobe-Wan Kenobi

3,694
2
40
67

Have a look at this answer [here](https://stackoverflow.com/a/38385033/6028910), unfortunately it seems the only way to do it with the Spark DataFrames API is to either use a UDF (which tend to be slow), or else use an RDD approach. – Shane Halloran Nov 08 '17 at 20:07

Spark VectorAssembler

0 Answers0