I'm trying to read a fairly large CSV file with Pandas and split it up into two random chunks, one of which being 10% of the data and the other being 90%.
Here's my current attempt:
rows = data.index
row_count = len(rows)
random.shuffle(list(rows))
data.reindex(rows)
training_data = data[row_count // 10:]
testing_data = data[:row_count // 10]
For some reason, sklearn
throws this error when I try to use one of these resulting DataFrame objects inside of a SVM classifier:
IndexError: each subindex must be either a slice, an integer, Ellipsis, or newaxis
I think I'm doing it wrong. Is there a better way to do this?