OneHotEncoder with input being an array

Question

I am trying to extract the feature from my raw data.

My raw data is a Seq[String].

I want to turn this into a OneHot encoding with several 1 instead of only one but it seems that the spark ml https://spark.apache.org/docs/latest/ml-features.html#onehotencoderestimator is only accepting a single String as input.

Maybe I am blind, but I can't seem to find one which accept a list of string.

Thank you.

So I used `HashingTF` but how are you able to go back from the encoding to the token ? I am going to take a look at `CountVectorizer` — Wonay, Jul 18 '18 at 17:51
If you need details go with `CountVectorizer` - [How to get word details from TF Vector RDD in Spark ML Lib?](https://stackoverflow.com/q/32285699/8371915) — Alper t. Turker, Jul 18 '18 at 18:03

Wonay · Accepted Answer · 2018-07-18T19:53:49.923

0

edited Jul 18 '18 at 19:53

answered Jul 18 '18 at 17:59

Wonay

1 Answers1