I have a dataset of around 15,500 rows. The data set consist of two columns: text column (independent variable) and output (dependent variable). Output has binary values (i.e. 0 and 1). Around 9500 rows have a value for Output columns (i.e. I can use it for training purpose) and the remaining 6000 rows (that do not have output column value) I want to use it for testing purpose. All rows (15500) are in one single file. I created a model definition file in which I used parallel_CNN
encoder for the text column. I used the following command to run to train and test the dataset:
ludwig experiment --dataset dataset_name.csv --config_file model_definitions.yml
Now the problem is that I don't tell the program to use the first 9500 rows to train the program and the remaining rows to test the model. Is there any way in Ludwig that I could pass any argument to tell which number of rows to be used for training and which rows should be used for testing? or is there any better way of doing the same task?