I'm trying to do a kmeans clustering algorithm from apache Spark's mlib library. I have everything setup but I'm not exactly sure how would I go about formatting the input data. I'm relatively new to machine learning so any help would be appreciated.
In the sample data.txt the data is as follows:
0.0 0.0 0.0
0.1 0.1 0.1
0.2 0.2 0.2
9.0 9.0 9.0
9.1 9.1 9.1
9.2 9.2 9.2
And the data that I want to run the algorithm on is in this format for now (json array):
[{"customer":"ddf6022","order_id":"20031-19958","asset_id":"dd1~33","price":300,"time":1411134115000,"location":"bt2"},{"customer":"ddf6023","order_id":"23899-23825","asset_id":"dd1~33","price":300,"time":1411954672000,"location":"bt2"}]
How can I convert it into something that can be used with the k-means clustering algorithm? I'm using Java, also I'm guessing I need it to be in a JavaRDD format, but have no idea how to go about doing it.