I have a file which is an original set that looks like this
1 1 1 40.57784227583149 27.618035602470936 40.576842275831495 27.617035602470935
1 3 5 40.57784227583149 27.618035602470936 40.576842275831495 27.617035602470935
1 2 4 40.57784227583149 27.618035602470936 40.576842275831495 27.617035602470935
1 10 3 40.57784227583149 27.618035602470936 40.576842275831495 27.617035602470935
1 5 5 40.57784227583149 27.618035602470936 40.576842275831495 27.617035602470935
1 7 4 40.57784227583149 27.618035602470936 40.576842275831495 27.617035602470935
2 7 1 40.576842275831495 27.617035602470935 40.576842275831495 27.617035602470935
2 8 5 40.576842275831495 27.617035602470935 40.5758422758315 27.616035602470934
2 1 5 40.576842275831495 27.617035602470935 40.576842275831495 27.617035602470935
2 5 1 40.576842275831495 27.617035602470935 40.576842275831495 27.617035602470935
2 4 4 40.576842275831495 27.617035602470935 40.5758422758315 27.616035602470934
2 3 2 40.576842275831495 27.617035602470935 40.576842275831495 27.617035602470935
3 5 4 40.576842275831495 27.617035602470935 40.576842275831495 27.617035602470935
3 7 5 40.576842275831495 27.617035602470935 40.576842275831495 27.617035602470935
3 4 1 40.576842275831495 27.617035602470935 40.5758422758315 27.616035602470934
3 8 3 40.576842275831495 27.617035602470935 40.5758422758315 27.616035602470934
3 2 1 40.576842275831495 27.617035602470935 40.576842275831495 27.617035602470935
4 5 4 40.576842275831495 27.617035602470935 40.576842275831495 27.617035602470935
4 9 1 40.576842275831495 27.617035602470935 40.5758422758315 27.616035602470934
4 8 4 40.576842275831495 27.617035602470935 40.5758422758315 27.616035602470934
4 4 4 40.576842275831495 27.617035602470935 40.5758422758315 27.616035602470934
4 10 5 40.576842275831495 27.617035602470935 40.576842275831495 27.617035602470935
4 7 3 40.576842275831495 27.617035602470935 40.576842275831495 27.617035602470935
5 5 1 40.5758422758315 27.616035602470934 40.576842275831495 27.617035602470935
5 2 4 40.5758422758315 27.616035602470934 40.576842275831495 27.617035602470935
5 6 1 40.5758422758315 27.616035602470934 40.5758422758315 27.616035602470934
5 7 3 40.5758422758315 27.616035602470934 40.576842275831495 27.617035602470935
5 10 2 40.5758422758315 27.616035602470934 40.576842275831495 27.617035602470935
5 9 5 40.5758422758315 27.616035602470934 40.5758422758315 27.616035602470934
the first column defines a UserID, the second a StoreID, the third is a Rating, fourth and fifth lng, lat of user current location and the fifth and sixth lng, lat of a store.
Each row defines a user post
I need to split this dataset as follows:
I want to keep the 80% of every user posts in the train set and 20% in the test set.
Searching on google I read about Weka. Some tutorials that I saw they randomly (from what I understand) erase lines but I do not want that. I want what I mentioned above.
So, my question is this:
Is there a tool to do what I need? I am free to other tools except Weka. If Weka can do what I need could somebody provide me some information or a tutorual?
EDIT
To give some more information about what I am trying to do, I am building a recommendation system and to check how accurate it is I need to split the data, calculate the predictions whether a user could like a location that hasn't been and then check my predictions from the recommendation algorithm with these of the test set to calculate the precision/recall, F measure etc
..
What I have done so far, is to randomly erase the 20% of each user posts but I think that there are tools that can do this in a better way than mine (obviously).
Thanks in advance!