1

I'm working through the TensorFlow Load pandas.DataFrame tutorial, and I'm trying to modify the output from a code snippet that creates the dictionary slices:

dict_slices = tf.data.Dataset.from_tensor_slices((df.to_dict('list'), target.values)).batch(16)

for dict_slice in dict_slices.take(1):
     print (dict_slice)

I find the following output sloppy, and I want to put it into a more readable table format.

Dictionary output from TensorFlow Tutorial

I tried to format the for loop, based on this recommendation Trying a basic for loop

Which gave me the error that the BatchDataset was not subscriptable

Error messages from for loop modification

Then I tried to use the range and leng function on the dict_slices, so that i would be an integer index and not a slice Modifying dict_slices with range and len

Which gave me the following error (as I understand, because the dict_slices is still an array, and each iteration is one vector of the array, not one index of the vector):

enter image description here

J. Fox
  • 33
  • 1
  • 1
  • 6

2 Answers2

2

Refer here for solution. To summarize we need to use as_numpy_iterator

example = list(dict_slices.as_numpy_iterator())
example[0]['age']
sakeesh
  • 919
  • 1
  • 10
  • 24
0

BatchDataset is a tf.data.Dataset instance that has been batches by calling it's .batch(..) method. You cannot "index" a tensorflow Dataset, or call the len function on it. I suggest iterating through it like you did in the first code snippet.

However in your dataset you are using .to_dict('list'), which means that a key in your dictionary is mapped to a list as value. Basically you have "columns" for every key and not rows, is this what you want? This would make printing line-by-line (shown in the table printing example you linked) a lot more difficult, since you do not have different features in a row. Also it is different from the example in the official Tensorflow code, where a datapoint consists of multiple features, and not one feature with multiple values.

Combining the Tensorflow code and pretty printing:

columns = list(df.columns.values)+['target']
dict_slices = tf.data.Dataset.from_tensor_slices((df.values, target.values)).batch(1) # batch = 1 because otherwise you will get multiple dict_slice - target pairs in one iteration below!

print(*columns, sep='\t') 
for dict_slice, target in dict_slices.take(1):
     print(*dict_slice.numpy(), target.numpy(), sep='\t')

This needs a bit of formatting, because column widths are not equal.

Andrea Angeli
  • 745
  • 1
  • 5
  • 14
  • Thank you for the input! With your modifications I'm getting one new warning and one new error: `**WARNING**:tensorflow:Model was constructed with shape (None,) for input Tensor("age:0", shape=(None,), dtype=float32), but it was called on an input with incompatible shape (None, 13).` And also `**AssertionError:** Could not compute output Tensor("dense_4/Identity:0", shape=(None, 1), dtype=float32)` – J. Fox May 08 '20 at 17:16
  • Those errors are not related to printing your datapoint as a formatted dict. These are about your constructed model and what its expected input shape looks like vs what you are feeding it. Open a separate question or show the relevant code and use-case. – Andrea Angeli May 08 '20 at 21:37