I have been using a CSV data to implement my scripts and wanted to sample the data into two datasets:
- Test Data
- Train Data
i want to sample the data sets in 85% and 15% divisions and want to output two CSV files Test.csv and Train.csv
i want it to do in base Python and do not want to use any other external module like Numpy, SciPy, Pandas or Scikitlearn. Can anyone help me out in random sampling of data by percentage. Moreover i will be provided with the datasets that may have random number of observations. SO far i have just read about Pandas and various other modules to sample the data by percentage basis and have not got any concrete solution for my problem.
Moreover i want to retain the headers of the CSV in both the files. Because headers would make each row accessible and can be used in further analysis.