I've got a csv that I want to split 80% into training, 10% into dev-test and 10% into test set. The dev-test wont be used further.
I've got it set up like:
import sklearn
import csv
with open('Letter.csv') as f:
reader = csv.reader(f)
annotated_data = [r for r in reader]
and for splitting:
import random
random.seed(1234)
random.shuffle(annotated_data)
But all the splitting I've seen only slips into 2 sets, and I can't see where to specify how much partition to split it with, eg I want 80% training. Maybe I'm blind, but can anyone help me? I don't know how to use pandas.
Also once I split it, how do I access the sets separately? For eg I can read each record as a whole and count the amount of entries, but once I split it I want to count how many records are in each set. Sorry if this deserves its own post, but I don't want to spam.