5

I'm trying to load the "iris" dataset directly from tensorflow datasets and I'm stuck. I'm use to working with CSVs.

import tensorflow as tf
import tensorflow_datasets as tfds

data = tfds.load("iris",split='train[:80%]', as_supervised=True)
data = data.batch(10)
features, labels = data

I don't know how I'm supposed to separate the features X,y. The labels are in a different tensor from the features, but I don't know how to access them to work with. I'd like to one hot encode the labels and feed them into the model, but I'm stuck here.

The tensorflow docs are sparse with info on how to do this. any help is much appreciated

Nicolas Gervais
  • 33,817
  • 13
  • 115
  • 143

1 Answers1

8

You can one-hot your labels within the .map() method and tf.one_hot, like that:

data = data.batch(10).map(lambda x, y: (x, tf.one_hot(y, depth=3)))

print(next(iter(data))[1])
<tf.Tensor: shape=(10, 3), dtype=float32, numpy=
array([[1., 0., 0.],
       [0., 0., 1.],
       [0., 1., 0.],
       [0., 0., 1.],
       [1., 0., 0.],
       [0., 1., 0.],
       [0., 1., 0.],
       [0., 1., 0.],
       [1., 0., 0.],
       [0., 0., 1.]], dtype=float32)>

Fully-working minimal example:

import tensorflow as tf
import tensorflow_datasets as tfds

data = tfds.load("iris",split='train[:80%]', as_supervised=True)
data = data.batch(10).map(lambda x, y: (x, tf.one_hot(y, depth=3))).repeat()

model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(8, activation='relu'),
    tf.keras.layers.Dense(16, activation='relu'),
    tf.keras.layers.Dense(3, activation='softmax')
])

model.compile(loss='categorical_crossentropy', optimizer='adam', 
    metrics=['categorical_accuracy'])

history = model.fit(data, steps_per_epoch=8, epochs=10)
Epoch 10/10
1/8 [==>...........................] - ETA: 0s - loss: 0.8848 - cat_acc: 0.6000
8/8 [==============================] - 0s 4ms/step - loss: 0.8549 - cat_acc: 0.5250
Nicolas Gervais
  • 33,817
  • 13
  • 115
  • 143
  • beautiful. that explains it. i didn't fully understand the map function and purpose. thank you. – AI-Lottery Winner Aug 04 '20 at 16:01
  • hmm, has something changed? When I do that I get `TypeError: () missing 1 required positional argument: 'y'` – Olli May 24 '21 at 08:41
  • 1
    @Olli what Tensorflow version do you have? It still works for me for `tf == 2.3` and `tfds == 4.3` – Nicolas Gervais May 24 '21 at 13:05
  • @NicolasGervais tensorflow 2.4 it also still works, my fault was not setting as_supervised = True which returned a feature dictionary instead. Apologies for the confusion! – Olli May 24 '21 at 13:13