2

I am aware that similar questions have been asked perviously but none of the proposed solutions seems to work for me. I have the following Pandas Dataframe:

Title Author Target Tag0 Tag1 Tag2 Tag3 Tag4 Tag5 Tag6 Tag7 Tag8 Tag9
0 Says Ron Johnson referred to "The Lego Movie" as an "insidious anti-business conspiracy." 0 0 30 0 36 35 nan nan nan nan nan nan
1 "Forty percent of the Fortune 500 were started either by immigrants or children of immigrants." 1 0 9 21 5 28 nan nan nan nan nan nan

I have vectorised Title attribute by means of TextVectorization layer in Keras obtaining the following Dataframe:

Title Author Target Tag0 Tag1 Tag2 Tag3 Tag4 Tag5 Tag6 Tag7 Tag8 Tag9
0 [9415, 19483, 9066, 16820, 20256, 6959, 6931,...,0 ] 0 0 3213 3829 223 3140 nan nan nan nan nan nan

I want to transform this Pandas dataframe to a TensorFlow dataset. I have tried to achieve this using the following code:

dataset = tf.data.Dataset.from_tensor_slices((data.values, target.values))

Here is the error I am getting:

ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type numpy.ndarray).

By removing Title column the error goes away, then Title is the column that makes the error. Title looks like this:

print(data["Title"].values)
array([array([ 9415., 19483.,  9066., 16820., 20256.,  6959.,  6931.,  8539.,
       10705.,  1342.,  1896.,  4353., 14143.,     0.,     0.,     0.,
           0.,     0.,     0.,     0.,     0.,     0.,     0.,     0.,
           0.,     0.,     0.,     0.,     0.,     0.,     0.,     0.,
           0.,     0.,     0.,     0.,     0.,     0.,     0.,     0.,
           0.,     0.,     0.,     0.,     0.,     0.,     0.,     0.,
           0.,     0.,     0.,     0.,     0.,     0.,     0.,     0.,
           0.,     0.,     0.,     0.,     0.,     0.,     0.,     0.,
           0.,     0.,     0.,     0.,     0.,     0.,     0.,     0.,
           0.,     0.,     0.,     0.,     0.,     0.,     0.],
       ...,
       array([17497., 20189.,  4280.,  3460., 20256., 15754.,  9178.,  1114.,
       19441., 18731., 13875., 14018.,  5789.,  6959.,  8740., 13042.,
         929.,  9541.,   773., 19384.,  5659., 13042., 14578.,  2813.,
       17452.,   888.,  6206.,  6959., 14540.,     0.,     0.,     0.,
           0.,     0.,     0.,     0.,     0.,     0.,     0.,     0.,
           0.,     0.,     0.,     0.,     0.,     0.,     0.,     0.,
           0.,     0.,     0.,     0.,     0.,     0.,     0.,     0.,
           0.,     0.,     0.,     0.,     0.,     0.,     0.,     0.,
           0.,     0.,     0.,     0.,     0.,     0.,     0.,     0.,
           0.,     0.,     0.,     0.,     0.,     0.,     0.],
      dtype=float32)], dtype=object)

My question is: What is wrong with Title? What should I change ?

I am assuming that is related to the data type of the numpy.ndarray containing each numpy.ndarray title. As it be can seen above dtype=object. But I am not really sure.

Thank you in advance!

Edit:

I found a work around to this issue by simply transforming the dataset to a Numpy ndarray.

# To numpy
numpy_dataset = data.to_numpy(dtype="<U43")

#Get Target
target = data.pop("Target")

#TF dataset
dataset = tf.data.Dataset.from_tensor_slices((numpy_dataset, target.values))
GGS
  • 153
  • 2
  • 11
  • 2
    Each cell of the `Title` column is an array. `values` is then an array of arrays. Try `np.stack(data["Title"].values)`. If it raises an error, those nested arrays differ in shape, and cannot be made into a 2d numeric array (which `tensorflow` can use). – hpaulj Jan 13 '21 at 20:53
  • Great that solved my problem **but** partially. As you can see in the code above I pass the dataframe not only `Titles`. If I do what you suggested, `tf.data.Dataset.from_tensor_slices((np.stack(data["Title"].values), target.values))` the `TensorFlow` dataset is created. But how can I include the remaining columns? – GGS Jan 13 '21 at 21:05
  • Other answers here : https://stackoverflow.com/questions/58636087/tensorflow-valueerror-failed-to-convert-a-numpy-array-to-a-tensor-unsupporte/75139312 – Skippy le Grand Gourou Jan 16 '23 at 20:24

2 Answers2

1

I found a work around to this issue by simply transforming the dataset to a Numpy ndarray.

# To numpy
numpy_dataset = data.to_numpy(dtype="<U43")

#Get Target
target = data.pop("Target")

#TF dataset
dataset = tf.data.Dataset.from_tensor_slices((numpy_dataset, target.values))
GGS
  • 153
  • 2
  • 11
1

I meet the same question when I try the demo of tf feature_columns.ipynb. I found the data contain null data, after drop them, the code worked

    #drop null data
     dataframe = dataframe.dropna(axis=0, how='any')
sybil wu
  • 11
  • 2