I'm need to separate a pandas data frame who i was read to csv, this data set need to be separated in 3 groups, training test and validation. But my problem is i don't know how many attributes the csv have, because i'm working with a lot of bases with different sizes of attributes( ones have 3 or 4 and others has 40+). I'm need to separate in parts
- Training = 50%
- Test = 25%
- Validation = 25%
So if i'm have 5 attributes with 100 values each, i'm need to get 50 lines just for train. How can i separate all the attributes and in the final i'm get a new Data Frame for each group, always keeping the right proportion have already implemented the function to read csv, if you can see they are generic, because they just only receive the path where are the csv and return a new Data Frame of this.
import pandas as pd
class Entity:
def __init__(self, path):
self.data_frame = pd.read_csv(path)
def get_value(self, attr):
return self.data_frame[attr]
def split_set(self):
pass
This class is the generic, i'm need to create this function split_set
to separate the set. I'm starting with panda and python now, sorry if this apparently is very easy to solve but I cannot think in a good way to do this.
Thanks in advance.