0

I have a directory which has 12 csv files. I read them using tensorflow using the following code:-

import tensorflow as tf

a = [0, 2, 3, 4, 5, 19, 23, 32, 39, 40, 42, 50, 51, 53, 56, 65, 66, 67, 68, 69]
data = tf.data.experimental.make_csv_dataset("./raw/*",
                                         batch_size=2000,
                                         select_columns = a,
                                         label_name="Cancelled",
                                         num_epochs = 30,
                                         num_parallel_reads=2)

How can I split this dataset into training and testing datasets?

I am quite new to tensorflow and have no idea how to work with prefetched datasets.

Shawn Brar
  • 1,346
  • 3
  • 17

1 Answers1

0

You can use:

train_size = int(0.7 * DATASET_SIZE)
test_size  = int(0.3 * DATASET_SIZE)

train_dataset = data.skip(train_size)
test_dataset  = data.take(train_size)

Test dataset has first (0.3 * DATASET_SIZE) elements and the rest goes for training.

Take: Creates a Dataset with at most count elements from this dataset.

Skip: Creates a Dataset that skips count elements from this dataset.

Most of the answers here use take() and skip(), which requires knowing the size of your dataset before hand.

Will
  • 1,619
  • 5
  • 23
  • Taken from https://stackoverflow.com/questions/48213766/split-a-dataset-created-by-tensorflow-dataset-api-in-to-train-and-test and https://stackoverflow.com/questions/51125266/how-do-i-split-tensorflow-datasets/58452268#58452268 – AloneTogether Nov 03 '22 at 09:44
  • But what if I don't know the size of my dataset. – Shawn Brar Nov 03 '22 at 09:47