How to load images and labels seperately in a dataset loaded by tensorflow_datasets

Question

import tensorflow_datasets as tfds

train_ds = tfds.load('cifar100', split='train[:90%]').shuffle(1024).batch(32)
val_ds = tfds.load('cifar100', split='train[-10%:]').shuffle(1024).batch(32)

I want to convert train_ds and val_ds into something like this: x_train, y_train and x_val, y_val (x for images and y for labels). The Keras API uses train and test data split (this seems to be the case in sklearn too), but I do not want to use any test data at all here.

I have tried this, but it didn't work (and I do understand why this doesn't work, but I don't know how else can I convert my training data to images and labels):

x_train = train_ds['image']

# TypeError: 'BatchDataset' object is not subscriptable

[Does this](https://stackoverflow.com/questions/56226621/how-to-extract-data-labels-back-from-tensorflow-dataset) answer your question? — Frightera, Feb 22 '21 at 18:37
I tried all the proposed solutions, but I keep running into this error `ValueError: too many values to unpack (expected 2)` — dedede, Feb 22 '21 at 18:46
and I think what is being asked in that question is different from my question here. — dedede, Feb 22 '21 at 19:30

Frightera · Answer 1 · 2021-02-22T20:49:56.897

1

Not the best way, I created lists firstly to inspect them. I think you want something like:

train_ds = tfds.load('mnist', split='train[:90%]')

train_examples_labels = tfds.as_numpy(train_ds)

x_train = []
y_train = []


for features_labels in train_examples_labels:
    x_train.append(features_labels['image'])
    y_train.append(features_labels['label'])

features_labels is a dictionary here:

features_labels.keys()
dict_keys(['image', 'label'])

After you can convert them into numpy arrays.

x_train = np.array(x_train, dtype = 'float32')
y_train = np.array(y_train, dtype = 'float32')

edited Feb 22 '21 at 20:49

answered Feb 22 '21 at 19:58

Frightera

4,773
2
13
28

I used your method but then when compared to the train data loaded by Keras API: ```(x_train_keras, y_train_keras), (x_test_keras, y_test_keras) = cifar100.load_data() y_train_keras == y_train array([[False, False, False, ..., False, False, False], [False, False, False, ..., False, False, False], [False, False, False, ..., False, False, False], ..., [False, False, False, ..., False, False, False], [False, False, False, ..., False, False, False], [False, False, False, ..., False, False, False]])``` – dedede Feb 22 '21 at 20:09
They might not be in the same order. I checked the shapes, they were same. – Frightera Feb 22 '21 at 20:19
That's what I thought too, but I haven't shuffled my data so I was wondering why the order is different – dedede Feb 22 '21 at 20:39
also, for some reason I'm unable to convert the final arrays to float32 – dedede Feb 22 '21 at 20:45
I checked the class distributions, they are equal just not ordered exactly. I am able to convert them as `float32`, check the edited answer. – Frightera Feb 22 '21 at 20:49
1

I found out that this was caused by batching the data during import. thanks for the help! – dedede Feb 22 '21 at 20:53

dedede · Accepted Answer · 2021-03-14T18:14:02.483

0

I found a better solution:

train_ds, val_ds = tfds.load(name="cifar100", split=('train[:90%]','train[-10%:]'), batch_size=-1, as_supervised=True)

x_train, y_train = tfds.as_numpy(train_data)
x_val, y_val = tfds.as_numpy(val_data)

edited Mar 14 '21 at 18:14

answered Mar 08 '21 at 04:25

dedede

197
11

How to load images and labels seperately in a dataset loaded by tensorflow_datasets

2 Answers2