I can only answer 1.
The point of a training set is to develop a generalization, which you then test with the test set to test your generalization. If you tweak anything about your learning algorithm and re-train/re-test without creating a new training and test set, you're really just learning the test set, not developing a generalization.
If your results are stable across the shuffling of the training and test data, you are more likely to have learned a good generalization.
This is called the repeated holdout method - see http://www.umiacs.umd.edu/~joseph/classes/459M/year2010/Chapter5-testing-4on1.pdf for a brief discussion of several methods. As alrikai suggested in the comments, this is the sort of material discussed on stats.stackexchange.com. For example: https://stats.stackexchange.com/questions/19048/what-is-the-difference-between-test-set-and-validation-set