I want to split raw dataframe into 3 subgroups: train, test, validate
I see three solutions, but afraid they are not correct and may cause bottle neck effect
1) add dictionary with keys
my_dict = {'train':raw_df.loc[start:end], 'test':raw_df.loc[start:end],
'val':raw_df.loc[start:end]}
2) create three dataframes
train_df = df.loc[start:end]
test_df = df.loc[start:end]
val_df = df.loc[start:end]
3) add new column with one of three random values random
df['train/test/val'] = pd.Series('train', index=df.index)
ALso, will adding dataframe in dictionary cause: bottle_neck effect of loosing performance advantages of dataframe being help in dictionary or list? Adding new columns in theory is increasing dimension of data Creating new dataframes I think is the worst variant cause it will eat tons of memory