Good afternoon to you all,
My data is in below format:
ID : VALUE(tags assigned by users)
0001: "PC, THINKPAD, T500"
0002: "PHONE, CELLPHONE, IPHONE, APPLE, IPHONE5"
.......and so on.
How can I write a code to:
1) first, convert these into sequence file in key:value format.
2) then, convert sequence file above to vectors that will be used for kmeans clustering?
I am checking out the SequenceFileFromdDirectory, and SparseVectorFromSequenceFiles, but these seems a little complicated and a little hard to read right now.
So, I wonder if anyone here could give me a simple sample code about how to do above two conversions?
Thank you very much!