3

I load my dataset like this:

self.train_ds = tf.data.experimental.make_csv_dataset(
            self.config["input_paths"]["data"]["train"],
            batch_size=self.params["batch_size"],
            shuffle=False,
            label_name="tags",
            num_epochs=1,
        )

My TextVectorization layer looks like this:

vectorizer = tf.keras.layers.TextVectorization(
            standardize=code_standaridization,
            split="whitespace",
            output_mode="int",
            output_sequence_length=params["input_dim"],
            max_tokens=100_000,
        )

And I thought this is going to be enough:

vectorizer.adapt(data_provider.train_ds)

But its not, I have this error:

TypeError: Expected string, but got Tensor("IteratorGetNext:0", shape=(None, None), dtype=string) of type 'Tensor'.

Can I somehow adapt my vectorizer on TensorFlow dataset?

1 Answers1

0

Most probably the issue is that you use batch_size in your train_ds without .unbatch() when you try to adapt.

You have to do:

vectorizer.adapt(train_ds.unbatch().map(lambda x, y: x).batch(BATCH_SIZE))

The .unbatch() solves the error that you are currently seeing and the .map() is needed because the TextVectorization layer operates on batches of strings so you need to get them from your dataset