34

I am aware that in TensorFlow, a tf.string tensor is basically a byte string. I need to do some operation with a filename which is stored in a queue using tf.train.string_input_producer().

A small snippet is shown below :

 key, value = reader.read(filename_queue)
 filename = value.eval(session=sess)
 print(filename)

However as a byte string it gives an output like the following :

b'\xff\xd8\xff\xe0\x00\x10JFIF\x00\x01\x01\x00\x00\x01\x00\x01\x00\x00\xff\xdb\x00C\x00\x08\x06\x06\x07\x06\x05\x08\x07\x07\x07\t\t\x08'

I tried to convert using

filename = tf.decode_raw(filename, tf.uint8)
filename = ''.join(chr(i) for i in filename)

However Tensor objects are not iterable and hence this fails.

Where am I going wrong ?

Is it a missing feature in TensorFlow that tf.string be converted to a Python string easily , or is there some other feature I am not aware about ?

More Info

The filename_queue has been prepared as follows :

train_set = ['file1.jpg', 'file2.jpg'] # Truncated for illustration
filename_queue = tf.train.string_input_producer(train_set, num_epochs=10, seed=0, capacity=1000)                  
Giovan Cruz
  • 681
  • 9
  • 21
Ujjwal
  • 1,849
  • 2
  • 17
  • 37
  • If you'd like to work with the string in Python, you need to [execute the TensorFlow graph](https://www.tensorflow.org/get_started/basic_usage) first. – Allen Lavoie Feb 09 '17 at 20:53
  • 1
    As you can see I have executed the graph inside a session. – Ujjwal Feb 09 '17 at 20:59
  • Your second approach is fine (`decode_raw`), you just need to evaluate the Tensor first. Although I have a feeling the reason you're not getting the result you want in the first approach is that this is binary data rather than a sensible filename. – Allen Lavoie Feb 09 '17 at 21:05
  • The tensor has been evaluated using `eval()` . After that when I use decode_raw, I get the error as stated in the question. As to the validity of the data, it is valid since the `tf.train.string_input_producer()` has been fed using a list of python strings ( which are valid filenames). – Ujjwal Feb 09 '17 at 21:36
  • 1
    Hi, @Ujjwal, have you ever solved this problem? I'm looking for the solution. Thanks. – mining Jun 13 '17 at 22:12

4 Answers4

9

In tensorflow 2.0.0, it can be done in the following way:

import tensorflow as tf

my_str = tf.constant('Hello World')
my_str_npy = my_str.numpy()

print(my_str_npy)
type(my_str_npy)

This converts a string tensor into a string of 'bytes' class

Aalok_G
  • 471
  • 7
  • 18
  • that's easy enough, but i couldn't guess it out, hm, numpy() is also non-numeric – Dee Oct 03 '19 at 04:48
  • 27
    AttributeError: 'Tensor' object has no attribute 'numpy' in tensorflow 2.1.0 – Swaroop Apr 09 '20 at 12:07
  • 5
    AttributeError: 'Tensor' object has no attribute 'numpy' in tensorflow 2.2.0 – Shark Deng Jun 05 '20 at 00:26
  • Anyone manage to solve this for new versions of Tensorflow? – hardanger Jun 20 '20 at 19:09
  • 5
    I can confirm that the above conversion code does work: `a = tf.constant('hello')` followed by `a.numpy().decode('ascii')` will return a python string with TF2.2. However, I think this only works in eager execution mode. For example, it works fine at the command line, but may not work in graph mode when the graph is being defined, but will work when the graph is being executed. – Hephaestus Jul 14 '20 at 23:11
  • You may use [tf.py_function](https://www.tensorflow.org/api_docs/python/tf/py_function) to have the .numpy method (or just the py_function like in the Shark's answer). You may also need a set_shape function https://stackoverflow.com/a/42590869/7647325 – savfod Jul 04 '22 at 14:42
2

In dataset, you can do this by tf.numpy_function wrapper

def get_img(path):
    path = bytes.decode(path) # called when use dataset since dataset is generator
    img = skimage.io.MultiImage(path)[-1]
    print(img.shape, type(img))
    return path

def wrap_get_img(path):  # turn tf.Tensor to tf.EagerTensor through the wrapper
    return tf.numpy_function(get_img, [path], [tf.string]) # [<tf.Tensor 'EagerPyFunc:0'


dataset = tf.data.Dataset.list_files("../prostate-cancer-grade-assessment/train_images/*.tiff") \
            .repeat()  \
            .shuffle(buffer_size=len(files)) \
            .map(wrap_get_img )    

for x in dataset:
    print(x) # Eager Tensor which can get string
    break
Shark Deng
  • 960
  • 9
  • 26
1
key, value = reader.read(filename_queue)

In this, the Reader just read the file you give, so value is the content of the file, not the filename, but you can output key, then you get filename

mxl
  • 269
  • 1
  • 2
  • 10
1

Use the as_text function in compat (from tensorflow.python.util) to convert the byte string of tensorflow. I.e.

filename = compat.as_text(filename)

wontleave
  • 177
  • 1
  • 10