I am aware that similar questions have been asked perviously but none of the proposed solutions seems to work for me. I have the following Pandas
Dataframe:
Title | Author | Target | Tag0 | Tag1 | Tag2 | Tag3 | Tag4 | Tag5 | Tag6 | Tag7 | Tag8 | Tag9 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Says Ron Johnson referred to "The Lego Movie" as an "insidious anti-business conspiracy." | 0 | 0 | 30 | 0 | 36 | 35 | nan | nan | nan | nan | nan | nan |
1 | "Forty percent of the Fortune 500 were started either by immigrants or children of immigrants." | 1 | 0 | 9 | 21 | 5 | 28 | nan | nan | nan | nan | nan | nan |
I have vectorised Title
attribute by means of TextVectorization
layer in Keras
obtaining the following Dataframe:
Title | Author | Target | Tag0 | Tag1 | Tag2 | Tag3 | Tag4 | Tag5 | Tag6 | Tag7 | Tag8 | Tag9 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | [9415, 19483, 9066, 16820, 20256, 6959, 6931,...,0 ] | 0 | 0 | 3213 | 3829 | 223 | 3140 | nan | nan | nan | nan | nan | nan |
I want to transform this Pandas
dataframe to a TensorFlow
dataset. I have tried to achieve this using the following code:
dataset = tf.data.Dataset.from_tensor_slices((data.values, target.values))
Here is the error I am getting:
ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type numpy.ndarray).
By removing Title
column the error goes away, then Title
is the column that makes the error. Title
looks like this:
print(data["Title"].values)
array([array([ 9415., 19483., 9066., 16820., 20256., 6959., 6931., 8539.,
10705., 1342., 1896., 4353., 14143., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0.],
...,
array([17497., 20189., 4280., 3460., 20256., 15754., 9178., 1114.,
19441., 18731., 13875., 14018., 5789., 6959., 8740., 13042.,
929., 9541., 773., 19384., 5659., 13042., 14578., 2813.,
17452., 888., 6206., 6959., 14540., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0.],
dtype=float32)], dtype=object)
My question is: What is wrong with Title
? What should I change ?
I am assuming that is related to the data type of the numpy.ndarray
containing each numpy.ndarray
title. As it be can seen above dtype=object
. But I am not really sure.
Thank you in advance!
Edit:
I found a work around to this issue by simply transforming the dataset to a Numpy
ndarray.
# To numpy
numpy_dataset = data.to_numpy(dtype="<U43")
#Get Target
target = data.pop("Target")
#TF dataset
dataset = tf.data.Dataset.from_tensor_slices((numpy_dataset, target.values))