2

I want to create a dataset with tensorflow and feed this with images as array (dtype=unit8) and labels as string. The images and the according labels are stored in a dataframe and the columns named as Image as Array and Labels.

Image as Array (type = array) Labels (type = string)
img_1 'ok'
img_2 'not ok'
img_3 'ok'
img_4 'ok'

My challenge: I don't know how to feed the Dataset out of a dataframe, the most tutorials prefer the way to load the data from a directory.

Thank you in forward and I hope you can help me to load the images in the dataset.

Christian01
  • 307
  • 1
  • 5
  • 19

2 Answers2

3

You can actually pass a dataframe directly to tf.data.Dataset.from_tensor_slices:

import tensorflow as tf
import numpy as np
import pandas as pd


df = pd.DataFrame(data={'images': [np.random.random((64, 64, 3)) for _ in range(100)],
                        'labels': ['ok', 'not ok']*50})

dataset = tf.data.Dataset.from_tensor_slices((list(df['images'].values), df['labels'].values)).batch(2)

for x, y in dataset.take(1):
  print(x.shape, y)
# (2, 64, 64, 3) tf.Tensor([b'ok' b'not ok'], shape=(2,), dtype=string)
AloneTogether
  • 25,814
  • 5
  • 20
  • 39
  • Thank you for the answer, the shape of image arrays is 70,162. So I think my output ```(70, 162) tf.Tensor(b'IO-Image', shape=(), dtype=string)``` looks good. What does the 'b' mean by ```tf.Tensor(b...``` ? – Christian01 May 25 '22 at 12:01
  • See here: https://stackoverflow.com/questions/6269765/what-does-the-b-character-do-in-front-of-a-string-literal#:~:text=The%20b%20denotes%20a%20byte,Bytes%20are%20the%20actual%20data. – AloneTogether May 25 '22 at 12:04
  • Yes I unterstand, the string will convert into bytes. In your output is a list of your labels. In my output is only one of two labels. If check the dataframe with the unique-method there are two labels. What is the reason? – Christian01 May 25 '22 at 12:11
  • two labels because I am using a batch size of 2.. – AloneTogether May 25 '22 at 12:15
  • Ahh, yes. I do a mistake. I forgot the ```.batch(2)```. Now, I get the same output. Thank you :) – Christian01 May 25 '22 at 12:18
0

One of possibility is to use range to create index dataset and then map array and label together.

# array
img = np.random.rand(4, 2, 2, 2)
label = np.array(['ok', 'not ok', 'ok', 'ok'])

# convert to tf constant
img = tf.constant(img)
label = tf.constant(label)

# create dataset with 0 - 3 index
dataset = tf.data.Dataset.range(len(label))
# map dataset
dataset = dataset.map(lambda x: (img[x, :, :, :], label[x]))

output:

<MapDataset element_spec=(TensorSpec(shape=(2, 2, 2), dtype=tf.float64, name=None), TensorSpec(shape=(), dtype=tf.string, name=None))>

output list- second idx:

for i in dataset:
    print(list(i)[1])

tf.Tensor(b'ok', shape=(), dtype=string)
tf.Tensor(b'not ok', shape=(), dtype=string)
tf.Tensor(b'ok', shape=(), dtype=string)
tf.Tensor(b'ok', shape=(), dtype=string)
Stanislav D.
  • 110
  • 1
  • 6