1

We are trying to finetune/train a pretrained RoBERTa model using tensorflow. For this we have to create a tf.data.Dataset from our dataframe.

The dataframe looks like this: Traindata

Where the three options are encoded strings, and the answer is an integer which corresponds to option A, B or C.

We try to make a tf.dataset from this using:

features= ['OptionA', 'OptionB', 'OptionC']

training_dataset = (
    tf.data.Dataset.from_tensor_slices(
        (
            tf.cast(train_data[features].values, tf.float32),
            tf.cast(train_data['Answer'].values, tf.int32)
        )
    )
)

However this does not work as we get the following error:

ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type list).

I have read that we can not use lists as an tf.dtype, where we have put 'float32' now. But we also cannot convert the lists in the dataframe to floats.

If anyone could point us in the right direction we would be very grateful! Thanks in advance!

Sam V
  • 479
  • 1
  • 4
  • 11

1 Answers1

0
features = np.asarray(features).astype('float32')

Should work.

Tensorflow - ValueError: Failed to convert a NumPy array to a Tensor (Unsupported object type float)

^the top solution in this has a more in-depth answer.

  • 1
    Thanks for the comment. I have read the linked post earlier as well, but unfortunately I get the error: "ValueError: could not convert string to float: 'OptionA'". This may be because of the column headers, I'm not sure – Sam V Dec 13 '20 at 09:02